Bug 101731 - System freeze with AMDGPU when playing The Witcher 3 (GOG GOTY)
Summary: System freeze with AMDGPU when playing The Witcher 3 (GOG GOTY)
Status: CLOSED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
: 102797 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-07-09 17:59 UTC by Philipp Überbacher
Modified: 2017-11-20 10:42 UTC (History)
9 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Save Game to reproduce the bug (770.94 KB, application/zip)
2017-07-09 17:59 UTC, Philipp Überbacher
Details
glxinfo output (101.19 KB, text/plain)
2017-07-09 18:00 UTC, Philipp Überbacher
Details
console output when replaying the apitrace of a crash (281.18 KB, text/plain)
2017-07-11 07:31 UTC, Philipp Überbacher
Details
The Witcher 3 crash save (GOG/GOTY version). (673.77 KB, application/x-7z-compressed)
2017-07-12 06:33 UTC, Shmerl
Details
dmesg when freeze almost happened (17.98 KB, text/plain)
2017-07-30 13:28 UTC, Lennard
Details
Save file near freeze area (Devil's Pit, Velen) (710.05 KB, application/x-7z-compressed)
2017-08-23 01:00 UTC, Shmerl
Details
special varying hack (2.25 KB, patch)
2017-09-19 21:02 UTC, Samuel Pitoiset
Details | Splinter Review
Hack patch debug run log (10.55 KB, application/x-xz)
2017-09-20 07:38 UTC, Shmerl
Details
updated special varying hack (2.25 KB, patch)
2017-09-20 08:14 UTC, Samuel Pitoiset
Details | Splinter Review
special varying hack backport 17.2.1 (1.87 KB, patch)
2017-09-21 16:32 UTC, Lukas Jirkovsky
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Überbacher 2017-07-09 17:59:32 UTC
Created attachment 132575 [details]
Save Game to reproduce the bug

Hi there.

I get reproducable system freezes when playing The Witcher 3. The save game that lets me reproduce this quickly is attached (requires The Witcher 3 with all Add-Ons).

I've reported this bug it wine first but as far as we could firgure out it is more likely a bug in mesa. You can find the wine bug report here: https://bugs.winehq.org/show_bug.cgi?id=43273

I'm using an AMD RX 460 on Arch Linux with Mesa 17.1.4.

I don't know how to debug this further since I can't do anything as soon as the freeze happens. The game music keeps playing. Sometimes Ctrl+Alt+FX lets me see the TTY, but nothing reacts afterwards and the game music stops.

There is nothing possibly related in the journal or Xorg logs.
Comment 1 Philipp Überbacher 2017-07-09 18:00:09 UTC
Created attachment 132576 [details]
glxinfo output
Comment 2 Clément Guérin 2017-07-10 02:57:12 UTC
Try doing an apitrace and post it here. Like this:

> WINEPREFIX=/path/to/prefix apitrace trace wine witcher3.exe

Then replaying the trace should hang your computer:

> apitrace replay wine64-preloader.trace
Comment 3 Philipp Überbacher 2017-07-10 06:04:40 UTC
Do you have any suggestion on how to get this trace within reasonable time?

It usually just takes me a few seconds to trigger the bug. As it stands I get about two frames per minute, which means it will take me hours to get the trace.

I tried lowering resolution and all gfx settings as far as possible (I still get the bug), but that helped only a little bit.
Comment 4 Philipp Überbacher 2017-07-10 07:51:41 UTC
I've tried this now for about 2 hours and have a 50 GB trace file. No freeze though. I guess it was about 30 seconds of running around in in-game time, which is usually enough to trigger the freeze.
I might have been unlucky or it might not happen in apitrace. Maybe someone else has more luck.

I might try once more to install the amdgpu-pro drivers and see whether it happens there as well.

I'm open to other suggestions.
Comment 5 Philipp Überbacher 2017-07-11 07:30:29 UTC
I've managed to install amdgpu-pro and that has brought me a bit closer to narrowing this down. Just for reference, the software versions are amdgpu-pro 17.10.401251-2 and related packages (https://aur.archlinux.org/packages/?O=0&K=amdgpu), mesa-noglvnd 17.1.4, xorg-server 1.18.4-1.

With amdgpu-pro I could narrow the freeze down to a specific option in the game: nvidia hairworks. With that option disabled I do not get the freeze. As soon as it is enabled and a game loaded the machine freezes.

I've used this to get a apitrace quickly and I have one with just 1.1 GB. However, replaying it does not produce the freeze. Maybe the actual freeze trigger didn't make it into the file. I'll provide you the file if you tell me how.
I do have a lot of warnings and errors on console when I replay that file (see console_out).

Nvidia hairworks does not trigger the freeze with amdgpu, but it does so immediately with amdgpu-pro. amdgpu triggers the freeze seemingly randomly, at least in Velen, not in White Orchard. amdgpu-pro does not trigger the freeze in Velen (unless hairworks is enabled of course).

Since both amdgpu and amdgpu-pro use mesa and the non-mesa proprietary nvidia driver does not trigger this bug it is likely something in mesa. I hope the above helps to track it down.
Comment 6 Philipp Überbacher 2017-07-11 07:31:29 UTC
Created attachment 132604 [details]
console output when replaying the apitrace of a crash
Comment 7 Shmerl 2017-07-12 05:54:52 UTC
I have the freeze with hairworks disabled all the same.
Comment 8 Shmerl 2017-07-12 05:57:24 UTC
(In reply to Philipp Überbacher from comment #5) 
> I've used this to get a apitrace quickly and I have one with just 1.1 GB.
> However, replaying it does not produce the freeze. Maybe the actual freeze
> trigger didn't make it into the file. I'll provide you the file if you tell
> me how.

You can try this service: https://uploadfiles.io

It's time limited though, but should be enough for 30 days.
Comment 9 Shmerl 2017-07-12 06:18:13 UTC
I noticed, when I set graphics settings to minimum, this freeze doesn't happen (or at least didn't happen to me so far).
Comment 10 Shmerl 2017-07-12 06:33:46 UTC
Created attachment 132626 [details]
The Witcher 3 crash save (GOG/GOTY version).

With latest Mesa built from source, it now consistently crashes for me on Velen checkpoint save, on max settings (hairworks disabled).

OpenGL renderer string: AMD Radeon RX 480 Graphics (AMD POLARIS10 / DRM 3.10.0 / 4.11.0-1-amd64, LLVM 4.0.1)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.2.0-devel (git-f7e78abdf4)
Comment 11 Philipp Überbacher 2017-07-12 16:19:25 UTC
(In reply to Shmerl from comment #10)
> Created attachment 132626 [details]
> The Witcher 3 crash save (GOG/GOTY version).
> 
> With latest Mesa built from source, it now consistently crashes for me on
> Velen checkpoint save, on max settings (hairworks disabled).
> 
> OpenGL renderer string: AMD Radeon RX 480 Graphics (AMD POLARIS10 / DRM
> 3.10.0 / 4.11.0-1-amd64, LLVM 4.0.1)
> OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.2.0-devel
> (git-f7e78abdf4)

That's wonderfull (in a way). Maybe you can get an apitrace from that?
Comment 12 Shmerl 2017-07-13 01:36:10 UTC
May be I'm doing somethin wrong. I tried to record a trace (using Mesa built from source which I load using a script).

I recorded a small amount - starting menu first, but when replaying it, I get black screen and such:

2127496 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1228
2127496: warning: unsupported glXSwapBuffersMscOML call
2128642 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1229
2128642: warning: unsupported glXSwapBuffersMscOML call
2130677 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1230
2130677: warning: unsupported glXSwapBuffersMscOML call
2131778 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1231
2131778: warning: unsupported glXSwapBuffersMscOML call
2133839 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929, target_msc = 0, divisor = 0, remainder = 0) = 1232
2133839: warning: unsupported glXSwapBuffersMscOML call
2136933 @3 glXCreateWindow(dpy = 0x7cb2f3b0, config = 0x7cc82380, win = 127926276, attribList = {}) = 121634992
2136933: warning: unsupported glXCreateWindow call
Rendered 0 frames in 6.86555 secs, average of 0 fps

So not sure if full trace would be useful until it will actually show anything.
Comment 13 Philipp Überbacher 2017-07-13 09:55:06 UTC
(In reply to Shmerl from comment #12)
> May be I'm doing somethin wrong. I tried to record a trace (using Mesa built
> from source which I load using a script).
> 
> I recorded a small amount - starting menu first, but when replaying it, I
> get black screen and such:
> 
> 2127496 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929,
> target_msc = 0, divisor = 0, remainder = 0) = 1228
> 2127496: warning: unsupported glXSwapBuffersMscOML call
> 2128642 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929,
> target_msc = 0, divisor = 0, remainder = 0) = 1229
> 2128642: warning: unsupported glXSwapBuffersMscOML call
> 2130677 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929,
> target_msc = 0, divisor = 0, remainder = 0) = 1230
> 2130677: warning: unsupported glXSwapBuffersMscOML call
> 2131778 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929,
> target_msc = 0, divisor = 0, remainder = 0) = 1231
> 2131778: warning: unsupported glXSwapBuffersMscOML call
> 2133839 @3 glXSwapBuffersMscOML(dpy = 0x7cb2f3b0, drawable = 121634929,
> target_msc = 0, divisor = 0, remainder = 0) = 1232
> 2133839: warning: unsupported glXSwapBuffersMscOML call
> 2136933 @3 glXCreateWindow(dpy = 0x7cb2f3b0, config = 0x7cc82380, win =
> 127926276, attribList = {}) = 121634992
> 2136933: warning: unsupported glXCreateWindow call
> Rendered 0 frames in 6.86555 secs, average of 0 fps
> 
> So not sure if full trace would be useful until it will actually show
> anything.

I've gotten the black screen in my replay-attempts too, but I guess that is normal. Otherwise the replay would require all the textures and whatnot.
Does your replay trigger the freeze (mine did not)? Maybe you can upload the trace?
Comment 14 Shmerl 2017-07-13 22:13:16 UTC
I didn't get to the freeze point in the replay, but I remember in the past, when I recorded a trace and replayed it, it actually showed images (i.e. video like). So I suppose something is wrong with my tracing. But I can record a crash trace just in case anyway.
Comment 15 Shmerl 2017-07-13 22:33:09 UTC
Interestingly, when I record a trace, and it reaches the point where it's supposed to freeze, it doesn't. I.e. the tracing somehow prevents it from happening.
Comment 16 Philipp Überbacher 2017-07-14 05:24:39 UTC
I finally came around to uploading this trace (should be up for 30 days). Remember that it was with amdgpu-pro and replaying did not cause the freeze. I hope it helps anyway.

https://ufile.io/pb1m8
Comment 17 Shmerl 2017-07-19 03:41:09 UTC
Just for the reference, the freeze doesn't happen to me anymore, in a newer configuration.

See https://bugs.winehq.org/show_bug.cgi?id=43273#c12
Comment 18 Shmerl 2017-07-21 01:48:52 UTC
Actually, I just experienced the freeze bug again. I guess it's somehow random, and it's not truly gone :(
Comment 19 Lennard 2017-07-29 21:20:32 UTC
I can confirm this happens with radeonsi too
Comment 20 Shmerl 2017-07-30 02:56:12 UTC
(In reply to Lennard from comment #19)
> I can confirm this happens with radeonsi too

Well, most previous reports were about radeonsi.
Comment 21 Lennard 2017-07-30 13:28:50 UTC
Created attachment 133134 [details]
dmesg when freeze almost happened

I was able to save my system by switching around TTYs somehow, checked dmesg and got this.
Using an R7 260X with radeonsi
Comment 22 Shmerl 2017-08-04 03:42:43 UTC
Did anyone try to reproduce this bug with AMD kernel that supports display code (i.e. one with Vega support)?
Comment 23 Philipp Überbacher 2017-08-04 06:35:39 UTC
(In reply to Shmerl from comment #22)
> Did anyone try to reproduce this bug with AMD kernel that supports display
> code (i.e. one with Vega support)?

The latest kernel I tried this with is 4.12.3, does that qualify? (mesa 17.1.5, xf86-video-amdgpu 1.3.0).
Comment 24 Shmerl 2017-08-04 06:42:45 UTC
(In reply to Philipp Überbacher from comment #23)
>
> The latest kernel I tried this with is 4.12.3, does that qualify? (mesa
> 17.1.5, xf86-video-amdgpu 1.3.0).

Did you build it from here: https://cgit.freedesktop.org/~agd5f/linux/tree/ or used some other method?
Comment 25 Shmerl 2017-08-06 04:18:18 UTC
Just tested it with stock Linux kernel 4.12.2 (from Debian experimental) and latest Mesa 17.3.0-devel (git-293b3e0a3f). The freeze still happens.
Comment 26 Shmerl 2017-08-10 00:50:36 UTC
Is there anything else useful that can be done to help Mesa / kernel developers to nail it down?
Comment 27 Samuel Pitoiset 2017-08-10 10:02:13 UTC
An apitrace that reproduces the issue would be very useful.
Comment 28 Shmerl 2017-08-10 15:20:56 UTC
(In reply to Samuel Pitoiset from comment #27)
> An apitrace that reproduces the issue would be very useful.

There is one example already from Philipp Überbacher above in the comments: https://bugs.freedesktop.org/show_bug.cgi?id=101731#c16

I tried to record this with apitrace too, but strangely, the freeze doesn't happen when it's recording. Somehow it prevents it by the fact of recording itself.

I'll re-record it anyway, and will post here.
Comment 29 Shmerl 2017-08-13 22:49:09 UTC
(In reply to Samuel Pitoiset from comment #27)
> An apitrace that reproduces the issue would be very useful.

See the trace here: https://ufile.io/i6czx

It's using Wine 2.14 with these patches: dark ground patch and:

ntdll-Grow_Virtual_Heap wined3d-buffer_create wined3d-sample_c_lz wined3d-Copy_Resource_Typeless xaudio2-get_al_format

And commented out portion that checks for GLX_OML_sync_control (as per recommendation from Józef Kucia in the wine bug, since apitrace chokes on GLX_OML_sync_control).

However, while it freezes the system when the game is run on its own in the above configuration, when it's being traced, the freeze doesn't happen.

Anyway, this will probably be of interest to find some issue in Mesa / amdgpu, but otherwise, I figured out that the freeze is gone if Wine is built skipping this patchset: wined3d-Copy_Resource_Typeless.
Comment 30 Shmerl 2017-08-14 02:33:41 UTC
Actually, even though the above freeze is gone if Wine is built right, there is still freeze happening around Velen area (in Devil's Pit). Not sure if it's related to the above, I'll try reproducing it, but above trace should at lest give some idea, what to investigate in amdgpu / radeonsi already. Stuff shouldn't just freeze the system.
Comment 31 Shmerl 2017-08-15 02:58:32 UTC
(In reply to Samuel Pitoiset from comment #27)
> An apitrace that reproduces the issue would be very useful.

I uploaded another trace: https://ufile.io/9z5yc

It's a problematic area (Devil's Pit) which hangs the game even when the Velen intro works. It doesn't hang when traced, but hangs quite reliably without it when you just turn camera around. Also, due to very intensive load, it's hard to record the trace - everything moves very slowly.

I compressed it with pixz, so you can decompress it faster as well (pixz -d). It's compatible with regular xz if anything.
Comment 32 Shmerl 2017-08-23 00:49:02 UTC
The problem still happen with kernel 4.13:

penGL renderer string: AMD Radeon (TM) RX 480 Graphics (POLARIS10 / DRM 3.18.0 / 4.13.0-rc5-amd64, LLVM 5.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.3.0-devel (git-f24cf82d6d)

I'm using latest Wine master with needed patches.
Comment 33 Shmerl 2017-08-23 01:00:18 UTC
Created attachment 133707 [details]
Save file near freeze area (Devil's Pit, Velen)

Just turn around a bit, especially looking at direction of the sun seems to trigger the freeze.
Comment 34 Shmerl 2017-09-04 17:00:28 UTC
(In reply to Samuel Pitoiset from comment #27)
> An apitrace that reproduces the issue would be very useful.

Hi Samuel. Any luck with reproducing or narrowing down this problem? The uploaded trace is going to expire soon. Let me know if you need another one, or anything else to help.
Comment 35 Pablo Estigarribia 2017-09-04 17:44:34 UTC
Could it be related to dpm? 

In my case I was trying many combinations of mesa versions, libdrm and kernels, but until many tests I have just changed dpm to high performance and no freeze happended anymore. 
Then I disabled dpm and no freeze since weeks. 

My report: https://bugs.freedesktop.org/show_bug.cgi?id=101976
Comment 36 Shmerl 2017-09-04 19:21:42 UTC
(In reply to Pablo Estigarribia from comment #35)
> Could it be related to dpm? 
> 
> In my case I was trying many combinations of mesa versions, libdrm and
> kernels, but until many tests I have just changed dpm to high performance
> and no freeze happended anymore. 


I tested your change, setting dpm to high. It didn't help, the freeze is still happening, so it must be something else.
Comment 37 Samuel Pitoiset 2017-09-05 12:25:02 UTC
(In reply to Shmerl from comment #34)
> (In reply to Samuel Pitoiset from comment #27)
> > An apitrace that reproduces the issue would be very useful.
> 
> Hi Samuel. Any luck with reproducing or narrowing down this problem? The
> uploaded trace is going to expire soon. Let me know if you need another one,
> or anything else to help.

No, I can't reproduce the issue with the trace on my system. I should probably set up a wine install at some point.
Comment 38 Shmerl 2017-09-05 14:55:09 UTC
(In reply to Samuel Pitoiset from comment #37)
> No, I can't reproduce the issue with the trace on my system. I should
> probably set up a wine install at some point.

Let me know if you need a GOG key for TW3. I've spoken to GOG Linux folks, and they are willing to help Mesa developers with this.
Comment 39 Shmerl 2017-09-06 00:35:21 UTC
For the reference, I just tested it with Linux 4.13.0 using amdgpu display code branch from AMD. Unfortunately the freeze still happens with it.
Comment 40 Samuel Pitoiset 2017-09-14 14:58:01 UTC
How to load a save game file? Where are they stored?
Comment 41 Shmerl 2017-09-14 15:05:44 UTC
(In reply to Samuel Pitoiset from comment #40)
> How to load a save game file? Where are they stored?

They should be in:

"${WINEPREFIX}/drive_c/users/$USER/My Documents/The Witcher 3/gamesaves"

I.e. it depends on what prefix you used.

Note that GOTY save file won't work with other versions, because of some minor incompatibilities. Even Steam version with all expansions isn't the same as GOG GOTY one. If you need the later, let me know. Linux GOG developers said they can provide a key.
Comment 42 Samuel Pitoiset 2017-09-14 15:24:34 UTC
What's your Steam AppID?
Comment 43 Lukas Jirkovsky 2017-09-14 21:56:03 UTC
I'm having the same problem during the initial cutscene in Velen.

Here are some additional information:

* While the computer seems frozen, it's not frozen completely. I can still connect over ssh and do stuff there as if nothing happened. Other services work uninterrupted, too.

* Locally, only SysRq helps. Even after killing everything using Alt+SysRq+i the computer doesn't react to anything apart from more SysRq shortcuts.

* dmesg doesn't contain anything useful

* Xorg.0.log doesn't contain anything useful either (on the wine bug there is a mention about input devices being removed, but that doesn't appear here unless forced using SysRq).

Happens with AMD RX 480 with mesa 17.2.0 and linux kernel 4.13.2
Comment 44 Shmerl 2017-09-14 22:05:35 UTC
(In reply to Lukas Jirkovsky from comment #43)
> Here are some additional information:

Yes, I observed that as well. You can access the box over ssh, but it doens't react to any local input. Also attempts to reboot it remotely hang (systemctl reboot). And lack of any sensible info in the logs is just strange.
Comment 45 Jan Vesely 2017-09-15 17:05:48 UTC
(In reply to Shmerl from comment #44)
> (In reply to Lukas Jirkovsky from comment #43)
> > Here are some additional information:
> 
> Yes, I observed that as well. You can access the box over ssh, but it
> doens't react to any local input. Also attempts to reboot it remotely hang
> (systemctl reboot). And lack of any sensible info in the logs is just
> strange.

sounds like hung GPU. afaik amdgpu.ko does not support GPU timeout/reset yet.
you can try reseting the GPU manually via 
/sys/class/drm/cardX/device/reset
Comment 46 Shmerl 2017-09-15 17:14:25 UTC
(In reply to Jan Vesely from comment #45)
> sounds like hung GPU. afaik amdgpu.ko does not support GPU timeout/reset yet.
> you can try reseting the GPU manually via 
> /sys/class/drm/cardX/device/reset

How exactly, by writing 1 there?
Comment 47 Lukas Jirkovsky 2017-09-15 20:20:03 UTC
(In reply to Jan Vesely from comment #45)
> sounds like hung GPU. afaik amdgpu.ko does not support GPU timeout/reset yet.
> you can try reseting the GPU manually via 
> /sys/class/drm/cardX/device/reset

There's no such file on my system. There is a reset file for other PCI busses, but not for the GPU.
Comment 48 Shmerl 2017-09-15 21:02:03 UTC
(In reply to Lukas Jirkovsky from comment #47)
> There's no such file on my system. There is a reset file for other PCI
> busses, but not for the GPU.

I don't have it either for RX 480 card.
Comment 49 Alex Deucher 2017-09-15 21:14:21 UTC
You can force a reset by reading /sys/kernel/debug/dri/0/amdgpu_gpu_reset but very few if any applications currently use the GL robustness extensions to query if the context is lost and resubmit their state.
Comment 50 Shmerl 2017-09-15 21:49:26 UTC
(In reply to Alex Deucher from comment #49)
> You can force a reset by reading /sys/kernel/debug/dri/0/amdgpu_gpu_reset
> but very few if any applications currently use the GL robustness extensions
> to query if the context is lost and resubmit their state.

For a test, I tried doing

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_reset

during normal desktop operation (in this setup it's card 1), and it just messes up KDE / sddm and even restarting sddm it isn't enough after that (soft reboot was enough).

Then I tested it after the The Witcher 3 freeze above (remotely, over ssh). That caused complete hang, that even ssh stopped working. So that required hard reboot.
Comment 51 Samuel Pitoiset 2017-09-19 21:02:48 UTC
Created attachment 134349 [details] [review]
special varying hack

Guys, can you apply the proposed special hacky patch and try to reproduce the hang? It should, at least, partially "fix" the issue in the Velen area (cf the savegame file).

To be sure the hack is enabled, please redirect stderr (wine witcher3.exe &> log) and look for "*** The Witcher 3 SPECIAL HACK ENABLED ***".

If the game exits with "Aborted! TFB varyings not correctly set!", there is something else, but I wouldn't be surprised as the patch is a huge hack just used to demonstrate the issue. Please report anyways. Thanks!
Comment 52 Shmerl 2017-09-20 03:18:37 UTC
(In reply to Samuel Pitoiset from comment #51)
> Guys, can you apply the proposed special hacky patch and try to reproduce
> the hang? It should, at least, partially "fix" the issue in the Velen area
> (cf the savegame file).
> 

I applied your hack patch, and here is the output I got (with my other settings active):

ATTENTION: default value of option mesa_glthread overridden by environment.
*** The Witcher 3 SPECIAL HACK ENABLED ***
Aborted! TFB varyings not correctly set!
source->Id = 250
AL lib: (EE) alc_cleanup: 2 devices not closed

The game indeed aborts, rather than hangs there.
Comment 53 Samuel Pitoiset 2017-09-20 07:24:39 UTC
Okay, that's expected. Didn't you get some Mesa user errors as well?

But the fact that it no longer hangs is a good news, somehow. :)
Comment 54 Shmerl 2017-09-20 07:38:37 UTC
Created attachment 134355 [details]
Hack patch debug run log

Run with MESA_DEBUG=true and Wine logging enabled.
Comment 55 Samuel Pitoiset 2017-09-20 07:52:37 UTC
Okay, the hack doesn't work for you, Mesa fails to link because the varying name is not the same.

What version of wine are you using? FWIW, I'm building my local copy from bb16263fe1974851f495435fef9a3d57fa2d4aa9 with all wine-staging patches applied on top of that commit.
Comment 56 Shmerl 2017-09-20 07:56:53 UTC
(In reply to Samuel Pitoiset from comment #55)
> Okay, the hack doesn't work for you, Mesa fails to link because the varying
> name is not the same.
> 
> What version of wine are you using? FWIW, I'm building my local copy from
> bb16263fe1974851f495435fef9a3d57fa2d4aa9 with all wine-staging patches
> applied on top of that commit.

Ah, I'm not using full staging, but regular Wine (relatively recent master build) with minimal patchsets required to run the game (as described here: https://appdb.winehq.org/objectManager.php?sClass=version&iId=34698#notes ).

Let me try it with full staging 2.17.
Comment 57 Shmerl 2017-09-20 08:02:47 UTC
Here is the run with Wine staging 2.17 (MESA_DEBUG set):

ATTENTION: default value of option mesa_glthread overridden by environment.
*** The Witcher 3 SPECIAL HACK ENABLED ***
Mesa: User error: GL_INVALID_OPERATION in glGetUniformLocation(program not linked)
Mesa: 244 similar GL_INVALID_OPERATION errors
Mesa: User error: GL_INVALID_OPERATION in glUseProgram(program 662 not linked)
Mesa: 1 similar GL_INVALID_OPERATION errors
Mesa: User error: GL_INVALID_OPERATION in glBeginTransformFeedback(no varyings to record)
Aborted! TFB varyings not correctly set!
source->Id = 250
AL lib: (EE) alc_cleanup: 2 devices not closed

It looks slightly different than before.
Comment 58 Shmerl 2017-09-20 08:04:50 UTC
I suppose I can also build Wine from that commit and apply all staging patches including past 2.17.
Comment 59 Shmerl 2017-09-20 08:06:26 UTC
Actually, looks like 2.17 is the last one, so their official build should be just that. It's based on commit bb16263fe1974851f495435fef9a3d57fa2d4aa9
Comment 60 Samuel Pitoiset 2017-09-20 08:11:02 UTC
Yeah, I built against the same commit and I'm able to reproduce the link-time error.
Comment 61 Samuel Pitoiset 2017-09-20 08:14:53 UTC
Created attachment 134356 [details] [review]
updated special varying hack

What about this updated patch? (the previous has to be reverted).
Comment 62 Shmerl 2017-09-20 08:28:40 UTC
I'll give a try. May be game settings affect what's going on too. For the reference, I set all to max, except hairworks off. Ambient occlusion: HBAO+.
Comment 63 Samuel Pitoiset 2017-09-20 09:13:32 UTC
*** Bug 102797 has been marked as a duplicate of this bug. ***
Comment 64 Samuel Pitoiset 2017-09-20 09:36:11 UTC
See the attached trace from https://bugs.freedesktop.org/show_bug.cgi?id=102797, it reproduces the same issue.

So, basically the issue is that wine fails to set the transform feedback varyings in some situations, this explains why the following message is reported "fixme:d3d_shader:shader_glsl_generate_transform_feedback_varyings Unsupported component range 2-2.". Then, the GPU will hang later on because it will read garbage from a TFB buffer.

About TW3, I think that game uses TFB in some scenarios, I don't know why and when, maybe it's based on some occlusion queries or some time constraints? Either way, this might explain why TFB is not used when tracing with apitrace or when using "GALLIUM_DDEBUG=800" which will flush and wait 800ms after every draw call.

The attached patches should workaround both issues (TW3 and Superposition), but wine has to be fixed here.

Please, let the bug open until it's really fixed.
Comment 65 Shmerl 2017-09-20 16:15:43 UTC
(In reply to Samuel Pitoiset from comment #61)
> Created attachment 134356 [details] [review] [review]
> updated special varying hack
> 
> What about this updated patch? (the previous has to be reverted).

Great! I can confirm, this patch helps both full staging, and regular + minimal patches Wine. Thanks! I'll point Wine developers to this.
Comment 66 Shmerl 2017-09-20 16:22:00 UTC
(In reply to Samuel Pitoiset from comment #64)
> 
> The attached patches should workaround both issues (TW3 and Superposition),
> but wine has to be fixed here.
> 
> Please, let the bug open until it's really fixed.

While Wine does something incorrect here, shouldn't amdgpu/radeonsi still handle such kind of issues more gracefully? I.e. while Wine should be fixed, I think Mesa shouldn't cause a system freeze when that happens. Can your patch approach be generally useful for Mesa to make it more resilient, or some other solution would be needed?
Comment 67 Lukas Jirkovsky 2017-09-20 19:56:22 UTC
I can confirm that it works fine here after applying the hack, too.

Anyway, I'm with Shmerl here. In my opinion a user process should never be able to make system unusable no matter what kind of stupid stuff it does. I'm fine with the application crashing or behaving incorrectly - it's that applications fault after all. Just don't take the system with it.

Also, great work, thank you!
Comment 68 Samuel Pitoiset 2017-09-21 14:49:44 UTC
Thanks for confirming that the hack actually works.

Yeah, it would be better to not hang in such situation but that's complicated. Though, you can try to boot with amdgpu.lockup_timeout=3000 (ie. wait 3s) to recover the state when a lockup is detected, it might work.
Comment 69 aidan 2017-09-21 15:03:22 UTC
(In reply to Samuel Pitoiset from comment #61)
> Created attachment 134356 [details] [review] [review]
> updated special varying hack
> 
> What about this updated patch? (the previous has to be reverted).

What commit should this patch be applied to?  It fails when applying to mesa 17.2.1:

patching file src/mesa/main/transformfeedback.c
Hunk #1 succeeded at 421 with fuzz 1 (offset 14 lines).
Hunk #2 succeeded at 1117 with fuzz 2 (offset 256 lines).
Hunk #3 FAILED at 870.
Hunk #4 FAILED at 879.
2 out of 4 hunks FAILED -- saving rejects to file src/mesa/main/transformfeedback.c.rej
Comment 70 Samuel Pitoiset 2017-09-21 15:31:32 UTC
(In reply to aidan from comment #69)
> (In reply to Samuel Pitoiset from comment #61)
> > Created attachment 134356 [details] [review] [review] [review]
> > updated special varying hack
> > 
> > What about this updated patch? (the previous has to be reverted).
> 
> What commit should this patch be applied to?  It fails when applying to mesa
> 17.2.1:

Against git master.
Comment 71 Lukas Jirkovsky 2017-09-21 16:32:12 UTC
Created attachment 134411 [details] [review]
special varying hack backport 17.2.1

Backported the patch to apply on 17.2.1

aidan: you can use this patch.
Comment 72 Shmerl 2017-09-24 00:55:24 UTC
(In reply to Samuel Pitoiset from comment #68)
> Thanks for confirming that the hack actually works.
> 
> Yeah, it would be better to not hang in such situation but that's
> complicated.

Would that require changes to the kernel driver?
Comment 73 Shmerl 2017-09-24 19:17:04 UTC
Józef Kucia made a hack patch for Wine to prevent the freeze: https://bugs.winehq.org/show_bug.cgi?id=43273#c43
Comment 74 mirh 2017-09-24 23:15:09 UTC
Wine's role should just be that of avoiding their.. stuff, to misbehave. 

But as for the freeze itself, I'd be expecting a bug in amdgpu, if the user level bug was *allowed* to escalate to kernel one.
Comment 75 Shmerl 2017-10-04 06:25:26 UTC
I made a variant of this hack for Wine itself: https://bugs.winehq.org/attachment.cgi?id=59387
Comment 76 Shmerl 2017-10-04 06:26:31 UTC
Unlike the previous one, it's minimal and doesn't conflict with various staging patches that are also useful for TW3.
Comment 77 Józef Kucia 2017-11-08 09:17:41 UTC
This bug should be fixed now in Wine main git tree. The fix will be included in the next development release.
Comment 78 Józef Kucia 2017-11-19 12:37:15 UTC
Fixed in Wine 2.21
Comment 79 Fabian Maurer 2017-11-19 13:28:56 UTC
Nice to hear the bug is fixed in wine, but the mesa bug still exists, so the resolution is wrong. It's simply not acceptable for a driver to freeze the system if an application misbehaves.
Comment 80 mirh 2017-11-19 14:50:08 UTC
I also agree with Fabian. 
Application going crazy with its own business is totally not a "problem of the driver".. 
But compromising system stability definitively is.
Comment 81 Józef Kucia 2017-11-19 18:44:45 UTC
(In reply to mirh from comment #80)
> I also agree with Fabian. 
> Application going crazy with its own business is totally not a "problem of
> the driver".. 
> But compromising system stability definitively is.
If you want to make this bug about unreliable/unimplemented GPU resets in amdgpu.ko, then it is filed against the wrong component. AFAIK there is nothing to fix in Mesa. Other than that, the bug is full of comments about the source of GPU hang. It may be better to file a new bug for implementing/fixing GPU resets in amdgpu.
Comment 82 mirh 2017-11-19 19:06:13 UTC
Guess it make sense. 
A thread per "actual issue to fix". 

Even though, it should take nothing to just change component from mesa to DRI :p
Comment 83 Samuel Pitoiset 2017-11-20 09:56:06 UTC
I do agree with Jozef, it's really a different issue that the one initially filled here. Thanks again for fixing this!
Comment 84 Christian König 2017-11-20 10:42:50 UTC
Guys please keep in mind that GPUs are programmable processors.

So when an application sends an shader with an infinity loop to the driver there is absolutely nothing the driver Mesa stack can do about that.

As Jozef correctly pointed out the best thing we can do is resetting the GPU after a timeout, but that is really complex and doesn't work all the time.

Anyway closing this bug since the original issue is fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.