Bug 99799

Summary: Civilization VI makes nouveau crash on register allocation
Product: Mesa Reporter: Gediminas Jakutis <gediminas>
Component: Drivers/DRI/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Nouveau Project <nouveau>
Severity: normal    
Priority: medium CC: fdsfgs, rhyskidd
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: patch that works around the issue

Description Gediminas Jakutis 2017-02-13 20:27:01 UTC
Civilization IV consistently segfaults on register allocation when loading a scenario, regardless if it's actual game or the benchmark mode.
I am attaching an apitrace [1] leading to the aforementioned segfault.
Also attaching a stacktrace, line listing of the crashing point and register dump [2] obtained by running the game through gdb.

I should also mention that the game sometimes randomly crashes in a peculiar way before even reaching the main menu. This report is not for that bug, which is not yet properly reported. I am mentioning this because replaying the trace does sometimes trigger that bug, too. Therefore, be advised that unless the trace goes past the main menu, You ought to replay again to reach the point of this bug.

[1] https://seriouss.am/etc/civ6-nouveau.trace.xz (apitrace file; xz'ed; 653MiB; 1050MiB uncompressed)
[2] https://seriouss.am/etc/civ6-gdb (plaintext; 5.6KiB)
Comment 1 Gediminas Jakutis 2017-02-13 21:10:13 UTC
A dump of Nouveau's compiler's[?] debug output, complete with shaders and whatnot:
https://seriouss.am/etc/civ6-shaderdump (plaintext; 635KiB; contains ANSI color escape sequences all over)
Comment 2 Ilia Mirkin 2017-02-14 17:43:50 UTC
OK, so this is a previously-known issue. There's another bug filed about it somewhere... crysis maybe? Anyways, it comes down to a problem with the delete_Instruction() in the spill code. When deleting the instruction (Instruction::~Instruction), it clears out its own ValueDef's (ValueDef::set), which should in turn update the relevant Value's defs lists.

However this happens in the middle of RA, which means that various instructions are joined into nodes, and value A's defs list ends up in value B's defs list.

Now this is where I get confused - when I change the logic to also remove the ValueDef from val->join, this does not help.

Further vexing is the fact that this particular spill shouldn't even be happening in the first place - it's a move between 2 LValues which I'm pretty sure are joined to each other.

Valgrind catches the first badness where this happens, which is when building live sets after spilling happens. Need to add more breaks and poke around more.
Comment 3 Ilia Mirkin 2017-02-16 02:57:38 UTC
Created attachment 129659 [details] [review]
patch that works around the issue

The attached patch should work around the issue of spilling a value into itself without invoking the wrath of the underlying bug that caused the whole thing to go south in the first place.

I'm not entirely convinced of this patch's correctness, so it will need some careful testing.
Comment 4 Gediminas Jakutis 2017-02-16 06:16:08 UTC
It now consistently segfaults in another point instead.
Backtrace and whatnot: https://seriouss.am/etc/civ-gdb-2017-02-16

I should note that NV50_PROG_OPTIMIZE=0 prevents a segfault, same as without the patch.
Comment 5 Tobias Klausmann 2017-08-07 15:52:58 UTC
Patch at [1] lets you run Civ6 without disabling optimizations. Please note that this patch will not be upstreamed, though!


[1] https://patchwork.freedesktop.org/patch/169870/
Comment 6 Rhys Kidd 2017-12-17 11:03:05 UTC
As an update, the latest Civ6 (v1.0.0.167) doesn't experience a crash in game with either of the two following graphics stack combinations on GP107M for me:

Mesa 17.2.2 / libdrm 2.4.83 / Kernel 4.15-rc1
Mesa 17.4.0-devel (git-546633dce2) / libdrm 2.4.83 / Kernel 4.15-rc1

I ran both the benchmark mode and also the 'Play Now' option.

Note: Visual corruptions remain - a number of blocks of blue colour and incorrectly clipped visual elements.
Comment 7 Karol Herbst 2017-12-17 14:06:29 UTC
(In reply to Rhys Kidd from comment #6)
> As an update, the latest Civ6 (v1.0.0.167) doesn't experience a crash in
> game with either of the two following graphics stack combinations on GP107M
> for me:
> 
> Mesa 17.2.2 / libdrm 2.4.83 / Kernel 4.15-rc1
> Mesa 17.4.0-devel (git-546633dce2) / libdrm 2.4.83 / Kernel 4.15-rc1
> 
> I ran both the benchmark mode and also the 'Play Now' option.
> 
> Note: Visual corruptions remain - a number of blocks of blue colour and
> incorrectly clipped visual elements.

well on Pascal we have 255 registers, on Kepler1 just 63.


(In reply to Gediminas Jakutis from comment #4)
> It now consistently segfaults in another point instead.
> Backtrace and whatnot: https://seriouss.am/etc/civ-gdb-2017-02-16
> 
> I should note that NV50_PROG_OPTIMIZE=0 prevents a segfault, same as without
> the patch.

yeah, I know what the problem here is. With higher optimization levels we get wide values being a block of more than just one register. Currently we can't spill those values. I tried to fix it, but also kind of failed...
Comment 8 GitLab Migration User 2019-09-18 20:44:52 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1127.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.