Bug 89034

Summary: Firefox crashing xserver and some major rendering bugs
Product: Mesa Reporter: smoki <smoki00790>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: critical    
Priority: medium CC: alexandre.f.demers, daniel, grantipak, kai, smoki00790
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: valley_artifacts
xcrash
gdb.txt
stderr.txt
attachment_1_Xorg
attachment_2_Dota2
Patch to re-enable subreg liveness
valley.png
stacking.png
subreg_disabled.txt
subreg_enabled.txt
Possible fix
subreg_enabled2.txt
hof.png

Description smoki 2015-02-09 06:08:29 UTC
Created attachment 113269 [details]
valley_artifacts

Well i put this as radeosi bug as i am not sure if happens elsewhere. It is LLVM bug actually which happens once subreg liveness is enabled, so svn 228228 is bisected as bad. I running current llvm with subreg liveness disabled, as this is major/grave one bug for me.

 Same issue was present in Tom's perf branches last month once subreg liveness is enabled too.

 Hardware is kabini (Athlon 5350), current Debian Sid 64bit, kernel 3.19.0, xserver git, mesa git, etc...

 Attached is screenshot from Unigine Valley for example, there are major rendering issues in many other GL apps.

 For Firefox crashing xserver not sure how to debug that (btw it crashed X immidiate at starting FF) , if i build llvm with debug and assertations screen/monitor somehow looks like it goes sleep mode (without any messages in logs) and only hard reset helps. If i build llvm without those it just crashing xserver, but there is not enough info in Xorg.0.log :(
Comment 1 Michel Dänzer 2015-02-09 06:22:13 UTC
(In reply to smoki from comment #0)
>  Attached is screenshot from Unigine Valley for example, there are major
> rendering issues in many other GL apps.

Haven't seen such artifacts on Kaveri. Can you attach the stderr output of running Valley or another affected app with R600_DEBUG=vs,ps with and without the bisected commit?


>  For Firefox crashing xserver not sure how to debug that [...]

Is there something about it in the Xorg stderr output? It should be captured in a gdm log file.
Comment 2 smoki 2015-02-09 06:40:11 UTC
 Well i can but i need first to make a bottle of coffee because one llvm build takes 30 minutes on Kabini :D. but OK will do something... later...

 I don't have gdm.log as i don't use it, plain startx is used. I can only attach Xorg.0.log without debug build... but well, later...
Comment 3 Michel Dänzer 2015-02-09 06:52:07 UTC
(In reply to smoki from comment #2)
>  Well i can but i need first to make a bottle of coffee because one llvm
> build takes 30 minutes on Kabini :D.

It shouldn't take that long after switching between the bisected commit and the one before it (or just re-applying/reverting it on top of whatever later commit you may have built last). If you're not using ccache yet, that might help a little as well.


>  I don't have gdm.log as i don't use it, plain startx is used.

Then something like

startx [...] 2>stderr.txt

should capture the stderr output in a file.
Comment 4 smoki 2015-02-09 07:56:28 UTC
Created attachment 113272 [details]
xcrash


 Using wihout patched llvm to post this as i can't use proper llvm with browser to post this :D

 This is with debug llvm, but without asserts enabled likely there are assertation there holding something to not logging, but dunno anyway might be useful.
Comment 5 Michel Dänzer 2015-02-09 08:04:32 UTC
Please get a backtrace of the crash with gdb by attaching gdb to the Xorg process via ssh before starting Firefox.
Comment 6 smoki 2015-02-09 08:40:25 UTC
(In reply to Michel Dänzer from comment #3)
> 
> Then something like
> 
> startx [...] 2>stderr.txt
> 
> should capture the stderr output in a file.

 Actually that one drop something, this is with debug+assertion enabled llvm when monitor just gooes to "sleep" after starting firefox... seems like usefull :) ?

 X: TargetRegisterInfo.cpp:189: virtual const llvm::TargetRegisterClass* llvm::TargetRegisterInfo::getMatchingSuperRegClass(const llvm::TargetRegisterClass*, const llvm::TargetRegisterClass*, unsigned int) const: Assertion `A && B && "Missing register class"' failed.
Comment 7 Michel Dänzer 2015-02-09 08:54:54 UTC
Please get a gdb backtrace (bt full) for the assertion failure then. Bonus points for running Xorg with R600_DEBUG=vs,ps and grabbing its stderr output leading up to the assertion failure as well.
Comment 8 smoki 2015-02-09 09:33:37 UTC
 Well i think i can't do that, because i don't have another machine right now.
Comment 9 Michel Dänzer 2015-02-10 01:29:01 UTC
Alternatively, you can try using a script for gdb's --command option, something like:

set logging on
set logging redirect
handle SIGPIPE nostop noprint
continue
bt full
continue
quit

That should capture the backtrace in a file called gdb.txt. See http://wiki.x.org/wiki/Development/Documentation/ServerDebugging/#index6h2 for more background.
Comment 10 smoki 2015-02-10 10:22:52 UTC
Created attachment 113309 [details]
gdb.txt


 Hopefully i did it fine :) gdb.txt attached...
Comment 11 smoki 2015-02-10 10:23:52 UTC
Created attachment 113310 [details]
stderr.txt


 ...and stderr.txt
Comment 12 smoki 2015-02-10 10:30:06 UTC
 Those seems interesting:

err = 0x7f6cbf5538f0 <error: Cannot access memory at address 0x7f6cbf5538f0>
buffer_data = 0x25 <error: Cannot access memory at address 0x25>
Comment 13 Nick Sarnie 2015-02-11 04:41:09 UTC
I'm getting this issue also. I thought it was my own fault because it was so strange. I'm on Linux Mint, Kernel 3.19 and the crash happens when using LLVM 3.7 on Xserver 1.16 and git.
Comment 14 Lorenzo Bona 2015-02-11 10:21:34 UTC
I'm using llvm from debian/ubuntu nightly build and I'm experiencing quite the same problem.

I've builded mesa against llvm3.7svn228689 yesterday evening.
KDE starts ok, with startx, but as soon as I open a window (terminal, dolphin, firefox or what ever) X crash.

You can see my xserver crash log. (attachment 1 [details] [review])

The last good build was on llvm3.7~svn227765 which is around the 2nd of February (nightly builds struggled since then in 32bit build, until yesterday afternoon).
Also I'm facing many corruptions on 227765 build in Dota2. (attachment 2 [details] [review])

Switched back to llvm3.6rc2-2 and it's ok now, X doesn't crash and Dota2 corruptions are gone.

Sorry but I can't bisect using nigthly packages (I'm not able to build a .deb from svn).
Comment 15 Lorenzo Bona 2015-02-11 10:22:07 UTC
Created attachment 113343 [details]
attachment_1_Xorg
Comment 16 Lorenzo Bona 2015-02-11 10:22:45 UTC
Created attachment 113344 [details]
attachment_2_Dota2
Comment 17 Lorenzo Bona 2015-02-11 10:26:30 UTC
Sorry, I forget some infos:

mesa/xserver/ddx/drm from git
kernel drm-fixes-3.19
GPU: R7-265
Comment 18 Tom Stellard 2015-02-11 18:28:15 UTC
I have just committed a change to llvm svn that disables sub-reg liveness and  filed an LLVM bug for this:

http://www.llvm.org/bugs/show_bug.cgi?id=22548
Comment 19 Lorenzo Bona 2015-02-12 08:12:06 UTC
Thank you Tom, with latest changes crashes and Valley rendering issue are gone for me.
BTW I'm still facing rendering issue in Dota2.

Performances with LLVM-3.7 are great, about 30FPS in Valley. Nice.
Comment 20 Daniel Scharrer 2015-02-12 08:50:05 UTC
(In reply to Lorenzo Bona from comment #19)
> BTW I'm still facing rendering issue in Dota2.

That looks a lot like bug #88978. You could try the trace posted there to see if you are experiencing the same issue.
Comment 21 Kai 2015-02-12 16:01:21 UTC
*** Bug 89045 has been marked as a duplicate of this bug. ***
Comment 22 Tom Stellard 2015-02-21 00:33:58 UTC
Created attachment 113709 [details] [review]
Patch to re-enable subreg liveness

I think this bug has been fixed.  This patch re-enables subreg livess.  Can you see if the issue still exists with this patch applied to LLVM git.
Comment 23 smoki 2015-02-21 08:42:34 UTC
Created attachment 113715 [details]
valley.png

(In reply to Tom Stellard from comment #22)
> Created attachment 113709 [details] [review] [review]
> Patch to re-enable subreg liveness
> 
> I think this bug has been fixed.  This patch re-enables subreg livess.  Can
> you see if the issue still exists with this patch applied to LLVM git.

 Just tried it on top of svn230129... Firefox does not crash xserver anymore, but rendering is still broken mostly fine now in valley, that "half picture" broken rendering is also fixed https://bugs.freedesktop.org/attachment.cgi?id=113269... but there are still black squares appear here and there - see attachment. Basically firefox xserver crash is fixed, but rendering in games is not... and i have some other examples when rendering is much worse.
Comment 24 smoki 2015-02-21 08:49:40 UTC
Created attachment 113716 [details]
stacking.png


 In Stacking game (as another example) rendering is also borked but differently, and so on...
Comment 25 Lorenzo Bona 2015-02-21 09:08:40 UTC
(In reply to smoki from comment #24)
> Created attachment 113716 [details]
> stacking.png
> 
> 
>  In Stacking game (as another example) rendering is also borked but
> differently, and so on...

Have you already tried this patch from Marek?
http://cgit.freedesktop.org/mesa/mesa/commit/?id=7692704b144b2aa9a57767a43212ceb5aad6638a

Rendering issue in Dota2 are quite gone now, sometimes you can see little glitch here and there, but very rarely.
Comment 26 smoki 2015-02-21 09:15:51 UTC
 @ Lorenzo Bona

 That is for SI, i am on CIK... this issue whole another one, probably affect all chips.

 @Tom

 There are also ~140 piglit quick tests failed once subreg liveness is enabled.
Comment 27 Daniel Scharrer 2015-02-21 13:38:50 UTC
Also no X server crashed here on TAHITI with LLVM r230124 + your patch.
Comment 28 Tom Stellard 2015-02-23 18:41:06 UTC
(In reply to smoki from comment #26)
>  @ Lorenzo Bona
> 
>  That is for SI, i am on CIK... this issue whole another one, probably
> affect all chips.
> 
>  @Tom
> 
>  There are also ~140 piglit quick tests failed once subreg liveness is
> enabled.

Which piglit tests regress and what GPU do you have?
Comment 29 smoki 2015-02-23 23:13:08 UTC
(In reply to Tom Stellard from comment #28)

> Which piglit tests regress and what GPU do you have?

 Kabini. I did fresh piglit run now and it shows there are 159 regressed now... too many to be listed so i upload html summary:

 https://dl.dropboxusercontent.com/u/74553632/compare.tar.bz2
Comment 30 Michel Dänzer 2015-02-24 06:59:10 UTC
(In reply to smoki from comment #29)
> I did fresh piglit run now and it shows there are 159 regressed now...

I think at least the piglit regressions aren't directly related to sub-register liveness and should be tracked in a separate bug report:

On my Kaveri, I've been seeing random failures of some (of the same as yours) piglit tests recently (with sub-register liveness disabled). The only way I've found to avoid those failures is to keep rebooting until I get lucky. It seems like some recent change (most likely in Mesa?) causes the hardware to go into a weird, semi-persistent state.

I'm afraid it might be tricky to bisect that, but it would be very helpful.
Comment 31 smoki 2015-02-24 13:14:59 UTC
(In reply to Michel Dänzer from comment #30)
> I think at least the piglit regressions aren't directly related to
> sub-register liveness and should be tracked in a separate bug report:

 Those regressions are only reproducable here with sub reg liveness enabled.

>On my Kaveri, I've been seeing random failures of some (of the same as yours) >piglit tests recently (with sub-register liveness disabled). The only way I've >found to avoid those failures is to keep rebooting until I get lucky. It seems >like some recent change (most likely in Mesa?) causes the hardware to go into a >weird, semi-persistent state.

>I'm afraid it might be tricky to bisect that, but it would be very helpful.

 That sounds like a separate one, but i don't have that and can't reproduce it on Kabini. I only have some known of those sometimes fails at random, but those are under "warn" and just few of them (i am talking about just 1-3 tests), but no "fail" tests happens here at random.
Comment 32 Tom Stellard 2015-02-24 20:20:37 UTC
(In reply to smoki from comment #31)
> (In reply to Michel Dänzer from comment #30)
> > I think at least the piglit regressions aren't directly related to
> > sub-register liveness and should be tracked in a separate bug report:
> 
>  Those regressions are only reproducable here with sub reg liveness enabled.
> 
> >On my Kaveri, I've been seeing random failures of some (of the same as yours) >piglit tests recently (with sub-register liveness disabled). The only way I've >found to avoid those failures is to keep rebooting until I get lucky. It seems >like some recent change (most likely in Mesa?) causes the hardware to go into a >weird, semi-persistent state.
> 
> >I'm afraid it might be tricky to bisect that, but it would be very helpful.
> 
>  That sounds like a separate one, but i don't have that and can't reproduce
> it on Kabini. I only have some known of those sometimes fails at random, but
> those are under "warn" and just few of them (i am talking about just 1-3
> tests), but no "fail" tests happens here at random.

Would you be able to set the environment variable R600_DEBUG=ps,vs and run the glsl-fs-min test with the good and bad commit and post the output.

R600_DEBUG=ps,vs ./bin/shader_runner tests/shaders/glsl-fs-min-shader_test -auto
Comment 33 smoki 2015-02-24 21:13:40 UTC
 glsl-fs-min is one of the random failing tests actually, it sometimes pass sometimes fail with or without subreg liveness, so that is not problem here i think.

 Currently i have 7 warns, 1 test which made gpu fault, 4 are crash/segfault and 22 which random failing. 18 of those that random failing (mostly on second piglit run) are EXT_transform_feedback/xyz tests, 4 on some glsl tests, etc...

 In whole that is 34 potentionaly problematic tests, with all those excluded from run, this is what i get - 136 tests which fail with subreg liveness enabled:

 https://dl.dropboxusercontent.com/u/74553632/compare2.tar.bz2
Comment 34 Michel Dänzer 2015-02-25 09:02:39 UTC
(In reply to smoki from comment #33)
>  glsl-fs-min is one of the random failing tests actually, it sometimes pass
> sometimes fail with or without subreg liveness, so that is not problem here
> i think.

I can't reproduce random failures with glsl-fs-min nor any piglit regressions with sub-register liveness enabled, but sub-register liveness doesn't seem to result in any code difference for glsl-fs-min anyway.


Can you find another test which consistently passes without sub-register liveness and fails with it *and* shows a difference between them in the R600_DEBUG=vs,ps stderr output, and attach the latter for both cases?
Comment 35 smoki 2015-02-25 09:51:07 UTC
Created attachment 113809 [details]
subreg_disabled.txt

 (In reply to Michel Dänzer from comment #34)
> 
> I can't reproduce random failures with glsl-fs-min nor any piglit
> regressions with sub-register liveness enabled, but sub-register liveness
> doesn't seem to result in any code difference for glsl-fs-min anyway.
> 

 I can't too if i run it alone, so there is no difference, it just fail sometimes in full piglit run.

> 
> Can you find another test which consistently passes without sub-register
> liveness and fails with it *and* shows a difference between them in the
> R600_DEBUG=vs,ps stderr output, and attach the latter for both cases?

 As i said yesterday comment 33 i trimmed down only ones which shows this regression, you can pick any of those 136 test which shows difference. Let say:

 R600_DEBUG=vs,ps ./bin/copy-pixels -samples=8 -auto

 Outputs attached, first without...
Comment 36 smoki 2015-02-25 09:52:34 UTC
Created attachment 113810 [details]
subreg_enabled.txt


...second with subreg liveness enabled.
Comment 37 Tom Stellard 2015-02-25 17:01:09 UTC
After examining the shader dumps one thing that looks suspicious to me is that in the good dump, we have several instructions like this:

image_load v[9:12], 15, 0, 0, 0, 0, 0, 0, 0, v[4:7], s[8:15]

But nothing is ever written to the last component of vaddr: v7

However, in the bad dumps we have:

image_load v[8:11], 15, 0, 0, 0, 0, 0, 0, 0, v[1:4], s[8:15]

And a value is stored in the last component of vaddr: v4 before every image load.
Comment 38 Tom Stellard 2015-02-25 17:52:57 UTC
Created attachment 113825 [details] [review]
Possible fix

Can you try this patch and see if it helps?
Comment 39 smoki 2015-02-25 18:12:00 UTC
Created attachment 113826 [details]
subreg_enabled2.txt

(In reply to Tom Stellard from comment #38)
> Created attachment 113825 [details] [review] [review]
> Possible fix
> 
> Can you try this patch and see if it helps?

 Still fail, as dump is now very different i attached it.
Comment 40 Tom Stellard 2015-03-20 00:33:49 UTC
There have been a few register allocator bugs fixed in LLVM recently, can you re-apply the "Patch to re-enable subreg liveness" to latest LLVM and test again?
Comment 41 smoki 2015-03-20 21:10:09 UTC
 Tried svn232842 with subreg liveness enabled + mesa 	a04b520890c669ce012b4b18165392dcabe0b27b

 Nothing, still same bugs are there.
Comment 42 Tom Stellard 2015-03-23 17:27:22 UTC
(In reply to smoki from comment #41)
>  Tried svn232842 with subreg liveness enabled + mesa 
> a04b520890c669ce012b4b18165392dcabe0b27b
> 
>  Nothing, still same bugs are there.

I can't reproduce any of these failures on my Verde card.  What is still failing for you? Piglit tests?  If you still see corruption in Unigine Valley, can you post the command you use to launch the program and which scene the corruption occurs in?
Comment 43 smoki 2015-03-23 21:13:38 UTC
Created attachment 114561 [details]
hof.png


 Yes, those piglit tests from comment 33 still failing. But also Unigine Valley still have corruptions, i run it via 'valey' script then apply some setings via interface. Squares happens regradles of settings on scenes 1, 2, 3 and 6. On 2 and 3 there is not only black squrares, but fog also starts to not render correctly on some/far camera positions.

 And also Stacking game from comment 24 have same borked rendering. Game Hands of Fate also show rendering issues (screenshot attached)... and so on, many apps are affected once i enable subreg liveness.
Comment 44 Tom Stellard 2015-03-24 13:49:22 UTC
(In reply to smoki from comment #43)
> Created attachment 114561 [details]
> hof.png
> 
> 
>  Yes, those piglit tests from comment 33 still failing. But also Unigine
> Valley still have corruptions, i run it via 'valey' script then apply some
> setings via interface. Squares happens regradles of settings on scenes 1, 2,
> 3 and 6. On 2 and 3 there is not only black squrares, but fog also starts to
> not render correctly on some/far camera positions.
> 

Can you try running the piglit tests with no X server and with the environment variable PIGLIT_PLATFORM=gbm You will need to install waffle from git and enable gbm support and then rebuild piglit for this to work.

-Tom

>  And also Stacking game from comment 24 have same borked rendering. Game
> Hands of Fate also show rendering issues (screenshot attached)... and so on,
> many apps are affected once i enable subreg liveness.
Comment 45 smoki 2015-03-24 15:12:29 UTC
 If you ask does same tests fail there, then yes - same tests fail with PIGLIT_PLATFORM=gbm with no xserver. And dump is the same with our example.

  PIGLIT_PLATFORM=gbm R600_DEBUG=vs,ps ./bin/copy-pixels -samples=8 -auto
Comment 46 smoki 2015-03-24 21:17:49 UTC
 Ah, i forgot to add that comparison anyway... That is no X gbm piglit, just enabled/disabled subreg liveness:

 https://dl.dropboxusercontent.com/u/74553632/compare11.tar.bz2
Comment 47 Tom Stellard 2015-03-28 01:06:41 UTC
If you enable sub-reg liveness in this branch: http://cgit.freedesktop.org/~tstellar/llvm/log/?h=sched-perf-Mar-27-2015, do you still see the bugs?
Comment 48 smoki 2015-03-28 06:18:51 UTC
(In reply to Tom Stellard from comment #47)
> If you enable sub-reg liveness in this branch:
> http://cgit.freedesktop.org/~tstellar/llvm/log/?h=sched-perf-Mar-27-2015, do
> you still see the bugs?

 In unigine valley there is not corruption with that anymore, perf goes down by 5% just to mention...

 But all other bugs are still there like corruptions in Stacking and Hands of Fate games and all same piglit tests still fail.
Comment 49 smoki 2015-05-26 00:55:50 UTC
 Issue fixed in llvm:

  R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips

  The src and dst register cannot be the same on chips with 16 lds banks.

 Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.