Summary: | Firefox crashing xserver and some major rendering bugs | ||
---|---|---|---|
Product: | Mesa | Reporter: | smoki <smoki00790> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | critical | ||
Priority: | medium | CC: | alexandre.f.demers, daniel, grantipak, kai, smoki00790 |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
valley_artifacts
xcrash gdb.txt stderr.txt attachment_1_Xorg attachment_2_Dota2 Patch to re-enable subreg liveness valley.png stacking.png subreg_disabled.txt subreg_enabled.txt Possible fix subreg_enabled2.txt hof.png |
(In reply to smoki from comment #0) > Attached is screenshot from Unigine Valley for example, there are major > rendering issues in many other GL apps. Haven't seen such artifacts on Kaveri. Can you attach the stderr output of running Valley or another affected app with R600_DEBUG=vs,ps with and without the bisected commit? > For Firefox crashing xserver not sure how to debug that [...] Is there something about it in the Xorg stderr output? It should be captured in a gdm log file. Well i can but i need first to make a bottle of coffee because one llvm build takes 30 minutes on Kabini :D. but OK will do something... later... I don't have gdm.log as i don't use it, plain startx is used. I can only attach Xorg.0.log without debug build... but well, later... (In reply to smoki from comment #2) > Well i can but i need first to make a bottle of coffee because one llvm > build takes 30 minutes on Kabini :D. It shouldn't take that long after switching between the bisected commit and the one before it (or just re-applying/reverting it on top of whatever later commit you may have built last). If you're not using ccache yet, that might help a little as well. > I don't have gdm.log as i don't use it, plain startx is used. Then something like startx [...] 2>stderr.txt should capture the stderr output in a file. Created attachment 113272 [details]
xcrash
Using wihout patched llvm to post this as i can't use proper llvm with browser to post this :D
This is with debug llvm, but without asserts enabled likely there are assertation there holding something to not logging, but dunno anyway might be useful.
Please get a backtrace of the crash with gdb by attaching gdb to the Xorg process via ssh before starting Firefox. (In reply to Michel Dänzer from comment #3) > > Then something like > > startx [...] 2>stderr.txt > > should capture the stderr output in a file. Actually that one drop something, this is with debug+assertion enabled llvm when monitor just gooes to "sleep" after starting firefox... seems like usefull :) ? X: TargetRegisterInfo.cpp:189: virtual const llvm::TargetRegisterClass* llvm::TargetRegisterInfo::getMatchingSuperRegClass(const llvm::TargetRegisterClass*, const llvm::TargetRegisterClass*, unsigned int) const: Assertion `A && B && "Missing register class"' failed. Please get a gdb backtrace (bt full) for the assertion failure then. Bonus points for running Xorg with R600_DEBUG=vs,ps and grabbing its stderr output leading up to the assertion failure as well. Well i think i can't do that, because i don't have another machine right now. Alternatively, you can try using a script for gdb's --command option, something like: set logging on set logging redirect handle SIGPIPE nostop noprint continue bt full continue quit That should capture the backtrace in a file called gdb.txt. See http://wiki.x.org/wiki/Development/Documentation/ServerDebugging/#index6h2 for more background. Created attachment 113309 [details]
gdb.txt
Hopefully i did it fine :) gdb.txt attached...
Created attachment 113310 [details]
stderr.txt
...and stderr.txt
Those seems interesting: err = 0x7f6cbf5538f0 <error: Cannot access memory at address 0x7f6cbf5538f0> buffer_data = 0x25 <error: Cannot access memory at address 0x25> I'm getting this issue also. I thought it was my own fault because it was so strange. I'm on Linux Mint, Kernel 3.19 and the crash happens when using LLVM 3.7 on Xserver 1.16 and git. I'm using llvm from debian/ubuntu nightly build and I'm experiencing quite the same problem. I've builded mesa against llvm3.7svn228689 yesterday evening. KDE starts ok, with startx, but as soon as I open a window (terminal, dolphin, firefox or what ever) X crash. You can see my xserver crash log. (attachment 1 [details] [review]) The last good build was on llvm3.7~svn227765 which is around the 2nd of February (nightly builds struggled since then in 32bit build, until yesterday afternoon). Also I'm facing many corruptions on 227765 build in Dota2. (attachment 2 [details] [review]) Switched back to llvm3.6rc2-2 and it's ok now, X doesn't crash and Dota2 corruptions are gone. Sorry but I can't bisect using nigthly packages (I'm not able to build a .deb from svn). Created attachment 113343 [details]
attachment_1_Xorg
Created attachment 113344 [details]
attachment_2_Dota2
Sorry, I forget some infos: mesa/xserver/ddx/drm from git kernel drm-fixes-3.19 GPU: R7-265 I have just committed a change to llvm svn that disables sub-reg liveness and filed an LLVM bug for this: http://www.llvm.org/bugs/show_bug.cgi?id=22548 Thank you Tom, with latest changes crashes and Valley rendering issue are gone for me. BTW I'm still facing rendering issue in Dota2. Performances with LLVM-3.7 are great, about 30FPS in Valley. Nice. (In reply to Lorenzo Bona from comment #19) > BTW I'm still facing rendering issue in Dota2. That looks a lot like bug #88978. You could try the trace posted there to see if you are experiencing the same issue. *** Bug 89045 has been marked as a duplicate of this bug. *** Created attachment 113709 [details] [review] Patch to re-enable subreg liveness I think this bug has been fixed. This patch re-enables subreg livess. Can you see if the issue still exists with this patch applied to LLVM git. Created attachment 113715 [details] valley.png (In reply to Tom Stellard from comment #22) > Created attachment 113709 [details] [review] [review] > Patch to re-enable subreg liveness > > I think this bug has been fixed. This patch re-enables subreg livess. Can > you see if the issue still exists with this patch applied to LLVM git. Just tried it on top of svn230129... Firefox does not crash xserver anymore, but rendering is still broken mostly fine now in valley, that "half picture" broken rendering is also fixed https://bugs.freedesktop.org/attachment.cgi?id=113269... but there are still black squares appear here and there - see attachment. Basically firefox xserver crash is fixed, but rendering in games is not... and i have some other examples when rendering is much worse. Created attachment 113716 [details]
stacking.png
In Stacking game (as another example) rendering is also borked but differently, and so on...
(In reply to smoki from comment #24) > Created attachment 113716 [details] > stacking.png > > > In Stacking game (as another example) rendering is also borked but > differently, and so on... Have you already tried this patch from Marek? http://cgit.freedesktop.org/mesa/mesa/commit/?id=7692704b144b2aa9a57767a43212ceb5aad6638a Rendering issue in Dota2 are quite gone now, sometimes you can see little glitch here and there, but very rarely. @ Lorenzo Bona That is for SI, i am on CIK... this issue whole another one, probably affect all chips. @Tom There are also ~140 piglit quick tests failed once subreg liveness is enabled. Also no X server crashed here on TAHITI with LLVM r230124 + your patch. (In reply to smoki from comment #26) > @ Lorenzo Bona > > That is for SI, i am on CIK... this issue whole another one, probably > affect all chips. > > @Tom > > There are also ~140 piglit quick tests failed once subreg liveness is > enabled. Which piglit tests regress and what GPU do you have? (In reply to Tom Stellard from comment #28) > Which piglit tests regress and what GPU do you have? Kabini. I did fresh piglit run now and it shows there are 159 regressed now... too many to be listed so i upload html summary: https://dl.dropboxusercontent.com/u/74553632/compare.tar.bz2 (In reply to smoki from comment #29) > I did fresh piglit run now and it shows there are 159 regressed now... I think at least the piglit regressions aren't directly related to sub-register liveness and should be tracked in a separate bug report: On my Kaveri, I've been seeing random failures of some (of the same as yours) piglit tests recently (with sub-register liveness disabled). The only way I've found to avoid those failures is to keep rebooting until I get lucky. It seems like some recent change (most likely in Mesa?) causes the hardware to go into a weird, semi-persistent state. I'm afraid it might be tricky to bisect that, but it would be very helpful. (In reply to Michel Dänzer from comment #30) > I think at least the piglit regressions aren't directly related to > sub-register liveness and should be tracked in a separate bug report: Those regressions are only reproducable here with sub reg liveness enabled. >On my Kaveri, I've been seeing random failures of some (of the same as yours) >piglit tests recently (with sub-register liveness disabled). The only way I've >found to avoid those failures is to keep rebooting until I get lucky. It seems >like some recent change (most likely in Mesa?) causes the hardware to go into a >weird, semi-persistent state. >I'm afraid it might be tricky to bisect that, but it would be very helpful. That sounds like a separate one, but i don't have that and can't reproduce it on Kabini. I only have some known of those sometimes fails at random, but those are under "warn" and just few of them (i am talking about just 1-3 tests), but no "fail" tests happens here at random. (In reply to smoki from comment #31) > (In reply to Michel Dänzer from comment #30) > > I think at least the piglit regressions aren't directly related to > > sub-register liveness and should be tracked in a separate bug report: > > Those regressions are only reproducable here with sub reg liveness enabled. > > >On my Kaveri, I've been seeing random failures of some (of the same as yours) >piglit tests recently (with sub-register liveness disabled). The only way I've >found to avoid those failures is to keep rebooting until I get lucky. It seems >like some recent change (most likely in Mesa?) causes the hardware to go into a >weird, semi-persistent state. > > >I'm afraid it might be tricky to bisect that, but it would be very helpful. > > That sounds like a separate one, but i don't have that and can't reproduce > it on Kabini. I only have some known of those sometimes fails at random, but > those are under "warn" and just few of them (i am talking about just 1-3 > tests), but no "fail" tests happens here at random. Would you be able to set the environment variable R600_DEBUG=ps,vs and run the glsl-fs-min test with the good and bad commit and post the output. R600_DEBUG=ps,vs ./bin/shader_runner tests/shaders/glsl-fs-min-shader_test -auto glsl-fs-min is one of the random failing tests actually, it sometimes pass sometimes fail with or without subreg liveness, so that is not problem here i think. Currently i have 7 warns, 1 test which made gpu fault, 4 are crash/segfault and 22 which random failing. 18 of those that random failing (mostly on second piglit run) are EXT_transform_feedback/xyz tests, 4 on some glsl tests, etc... In whole that is 34 potentionaly problematic tests, with all those excluded from run, this is what i get - 136 tests which fail with subreg liveness enabled: https://dl.dropboxusercontent.com/u/74553632/compare2.tar.bz2 (In reply to smoki from comment #33) > glsl-fs-min is one of the random failing tests actually, it sometimes pass > sometimes fail with or without subreg liveness, so that is not problem here > i think. I can't reproduce random failures with glsl-fs-min nor any piglit regressions with sub-register liveness enabled, but sub-register liveness doesn't seem to result in any code difference for glsl-fs-min anyway. Can you find another test which consistently passes without sub-register liveness and fails with it *and* shows a difference between them in the R600_DEBUG=vs,ps stderr output, and attach the latter for both cases? Created attachment 113809 [details] subreg_disabled.txt (In reply to Michel Dänzer from comment #34) > > I can't reproduce random failures with glsl-fs-min nor any piglit > regressions with sub-register liveness enabled, but sub-register liveness > doesn't seem to result in any code difference for glsl-fs-min anyway. > I can't too if i run it alone, so there is no difference, it just fail sometimes in full piglit run. > > Can you find another test which consistently passes without sub-register > liveness and fails with it *and* shows a difference between them in the > R600_DEBUG=vs,ps stderr output, and attach the latter for both cases? As i said yesterday comment 33 i trimmed down only ones which shows this regression, you can pick any of those 136 test which shows difference. Let say: R600_DEBUG=vs,ps ./bin/copy-pixels -samples=8 -auto Outputs attached, first without... Created attachment 113810 [details]
subreg_enabled.txt
...second with subreg liveness enabled.
After examining the shader dumps one thing that looks suspicious to me is that in the good dump, we have several instructions like this: image_load v[9:12], 15, 0, 0, 0, 0, 0, 0, 0, v[4:7], s[8:15] But nothing is ever written to the last component of vaddr: v7 However, in the bad dumps we have: image_load v[8:11], 15, 0, 0, 0, 0, 0, 0, 0, v[1:4], s[8:15] And a value is stored in the last component of vaddr: v4 before every image load. Created attachment 113825 [details] [review] Possible fix Can you try this patch and see if it helps? Created attachment 113826 [details] subreg_enabled2.txt (In reply to Tom Stellard from comment #38) > Created attachment 113825 [details] [review] [review] > Possible fix > > Can you try this patch and see if it helps? Still fail, as dump is now very different i attached it. There have been a few register allocator bugs fixed in LLVM recently, can you re-apply the "Patch to re-enable subreg liveness" to latest LLVM and test again? Tried svn232842 with subreg liveness enabled + mesa a04b520890c669ce012b4b18165392dcabe0b27b Nothing, still same bugs are there. (In reply to smoki from comment #41) > Tried svn232842 with subreg liveness enabled + mesa > a04b520890c669ce012b4b18165392dcabe0b27b > > Nothing, still same bugs are there. I can't reproduce any of these failures on my Verde card. What is still failing for you? Piglit tests? If you still see corruption in Unigine Valley, can you post the command you use to launch the program and which scene the corruption occurs in? Created attachment 114561 [details] hof.png Yes, those piglit tests from comment 33 still failing. But also Unigine Valley still have corruptions, i run it via 'valey' script then apply some setings via interface. Squares happens regradles of settings on scenes 1, 2, 3 and 6. On 2 and 3 there is not only black squrares, but fog also starts to not render correctly on some/far camera positions. And also Stacking game from comment 24 have same borked rendering. Game Hands of Fate also show rendering issues (screenshot attached)... and so on, many apps are affected once i enable subreg liveness. (In reply to smoki from comment #43) > Created attachment 114561 [details] > hof.png > > > Yes, those piglit tests from comment 33 still failing. But also Unigine > Valley still have corruptions, i run it via 'valey' script then apply some > setings via interface. Squares happens regradles of settings on scenes 1, 2, > 3 and 6. On 2 and 3 there is not only black squrares, but fog also starts to > not render correctly on some/far camera positions. > Can you try running the piglit tests with no X server and with the environment variable PIGLIT_PLATFORM=gbm You will need to install waffle from git and enable gbm support and then rebuild piglit for this to work. -Tom > And also Stacking game from comment 24 have same borked rendering. Game > Hands of Fate also show rendering issues (screenshot attached)... and so on, > many apps are affected once i enable subreg liveness. If you ask does same tests fail there, then yes - same tests fail with PIGLIT_PLATFORM=gbm with no xserver. And dump is the same with our example. PIGLIT_PLATFORM=gbm R600_DEBUG=vs,ps ./bin/copy-pixels -samples=8 -auto Ah, i forgot to add that comparison anyway... That is no X gbm piglit, just enabled/disabled subreg liveness: https://dl.dropboxusercontent.com/u/74553632/compare11.tar.bz2 If you enable sub-reg liveness in this branch: http://cgit.freedesktop.org/~tstellar/llvm/log/?h=sched-perf-Mar-27-2015, do you still see the bugs? (In reply to Tom Stellard from comment #47) > If you enable sub-reg liveness in this branch: > http://cgit.freedesktop.org/~tstellar/llvm/log/?h=sched-perf-Mar-27-2015, do > you still see the bugs? In unigine valley there is not corruption with that anymore, perf goes down by 5% just to mention... But all other bugs are still there like corruptions in Stacking and Hands of Fate games and all same piglit tests still fail. Issue fixed in llvm: R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips The src and dst register cannot be the same on chips with 16 lds banks. Closing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 113269 [details] valley_artifacts Well i put this as radeosi bug as i am not sure if happens elsewhere. It is LLVM bug actually which happens once subreg liveness is enabled, so svn 228228 is bisected as bad. I running current llvm with subreg liveness disabled, as this is major/grave one bug for me. Same issue was present in Tom's perf branches last month once subreg liveness is enabled too. Hardware is kabini (Athlon 5350), current Debian Sid 64bit, kernel 3.19.0, xserver git, mesa git, etc... Attached is screenshot from Unigine Valley for example, there are major rendering issues in many other GL apps. For Firefox crashing xserver not sure how to debug that (btw it crashed X immidiate at starting FF) , if i build llvm with debug and assertations screen/monitor somehow looks like it goes sleep mode (without any messages in logs) and only hard reset helps. If i build llvm without those it just crashing xserver, but there is not enough info in Xorg.0.log :(