Metro 2033 Redux hangs when the certain combination of mesa version, kernel version and kernel configuration is used. This is always happen on loading screen. I have done some tests using integrated benchmark (benchmark.sh): linux-4.14.x + mesa-7.3.x = OK linux-4.14.x + mesa-8.0.x / mesa-8.1.x = hang linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=y = OK linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=n + mesa-8.0.x / mesa-8.1.x = hang When the hang occur, it is causes massive slowdown of all other graphical applications. With 4.14 kernels the game process is unkillable so it hangs somewhere in the kernel space. With 4.17 kernels it can be killed but this takes some time. My GPU: 01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga PRO [Radeon R9 285/380] [1002:6939] (rev f1) (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited / Sapphire Technology Tonga PRO [Radeon R9 285/380] [174b:e305]
Created attachment 140639 [details] linux_metro_bisect.log > # first bad commit: [6ed4e2e673d348df6623012a628a8ab8624e3222] drm/ttm: add transparent huge page support for wc or uc allocations v2 Bisect is done with CONFIG_TRANSPARENT_HUGEPAGE=y. This is how I came to an idea to play with transparent huge pages. Yes, I forgot about --term-old/--term-new bisect options :)
(In reply to Alexander Tsoy from comment #0) > With 4.14 kernels the game process is unkillable so it hangs somewhere > in the kernel space. With 4.17 kernels it can be killed but this > takes some time. The process actually can be killed in a while loop. Perf report: $ sudo perf report | grep metro | head 33.33% metro metro [.] cbackend_OGL::delayed_upload 31.56% metro [kernel.vmlinux] [k] rb_prev 2.07% metro [kernel.vmlinux] [k] alloc_iova 0.20% metro [kernel.vmlinux] [k] __switch_to 0.18% metro [kernel.vmlinux] [k] native_load_gs_index 0.13% metro [kernel.vmlinux] [k] __x86_indirect_thunk_rax 0.12% metro [kernel.vmlinux] [k] entry_SYSCALL_64 0.08% metro [kernel.vmlinux] [k] __schedule 0.08% metro [kernel.vmlinux] [k] read_tsc 0.07% metro libc-2.26.so [.] __nanosleep
(In reply to Alexander Tsoy from comment #0) > > linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=y = OK > linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=n + mesa-8.0.x / mesa-8.1.x = > hang Did you swap CONFIG_TRANSPARENT_HUGEPAGE=y/n here? I.e. CONFIG_TRANSPARENT_HUGEPAGE=y is bad, CONFIG_TRANSPARENT_HUGEPAGE=n is good? If not, how exactly did you bisect with CONFIG_TRANSPARENT_HUGEPAGE=y ?
(In reply to Michel Dänzer from comment #3) > (In reply to Alexander Tsoy from comment #0) > > > > linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=y = OK > > linux-4.17.x with CONFIG_TRANSPARENT_HUGEPAGE=n + mesa-8.0.x / mesa-8.1.x = > > hang > > Did you swap CONFIG_TRANSPARENT_HUGEPAGE=y/n here? I.e. > CONFIG_TRANSPARENT_HUGEPAGE=y is bad, CONFIG_TRANSPARENT_HUGEPAGE=n is good? Yes, after getting a clue that this bug could be related to transparent huge pages, I tried to disable CONFIG_TRANSPARENT_HUGEPAGE in 4.17.6 kernel. This results in the same hang I had with 4.14.x kernels. Note that transparent huge pages must be disabled at build time. cmdline option " transparent_hugepage=never" doesn't change anything.
To clarify a bit: first bad commit in bisect is actually the first good commit that fixed hangs in Metro.
(In reply to Alexander Tsoy from comment #5) > To clarify a bit: first bad commit in bisect is actually the first good > commit that fixed hangs in Metro. But only when transparent huge pages are enabled of course.
Created attachment 140964 [details] dmesg Same problem with the latest amd-staging-drm-next (commit bf1fd52b0632cd17ac875432a36d3e92be96d8cb). Now the kernel gives me the following errors: [ 324.552371] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* amdgpu_cs_list_validate(validated) failed. [ 324.561030] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission! And with CONFIG_TRANSPARENT_HUGEPAGE=y the same kernel works fine.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/447.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.