Created attachment 136726 [details] dmesg My workstation running arch linux with a RX560 GPU randomly reboots after a few minutes of use with the new amdgpu version in kernel 4.15. It seems to mostly happen when there's a full screen redraw happening (such as maximizing a window.) It happens regardless of desktop environment (I've tested GNOME on both wayland and Xorg, as well as KDE). If it's left at the GDM login screen it doesn't seem to reboot. Kernel version 4.14 and older seems to be rock solid. I've tried the following things to workaround it but nothing seems to make any difference: * Setting amdgpu.dc to both 1 and 0 * Disabling dpm by setting amdgpu.dpm to 0 * Upgrading MESA from 17.3.2 to latest git master
Can you try bisecting? Make sure to test each commit for plenty of time before marking it as good.
Hi Michel, After bisecting it seems as the offending commit seems to be: commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König <christian.koenig@amd.com> Date: Thu Jul 6 09:59:43 2017 +0200 drm/ttm: add transparent huge page support for DMA allocations v2 Try to allocate huge pages when it makes sense. v2: fix comment and use ifdef Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> After reverting that 4.15-rc8 seems to be working fine.
Does your kernel also have the following patch? commit f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 Author: Christian König <christian.koenig@amd.com> Date: Mon Oct 9 14:34:13 2017 +0200 drm/ttm: don't use compound pages for now We need to figure out first how to correctly map them into the CPU page tables. bug: https://bugs.freedesktop.org/show_bug.cgi?id=103138 Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sorry, just seen that you wrote that you are using 4-15-rc8 and that should include the patch. No idea what's going wrong here. You not by any chance could add a serial/network console and grab the last logs before the reboot?
Christian, These are the last messages from a network console. All of them are from before the crash: [ 90.254437] [drm] {3840x2160, 4000x2222@533250Khz} [ 91.026831] [drm] U24E590: [Block 0] [ 91.028144] [drm] U24E590: [Block 1] [ 91.029479] [drm] dc_link_detect: manufacturer_id = 2D4C, product_id = CD3, serial_number = 304D5844, manufacture_week = 50, manufacture_year = 26, display_name = U24E590, speaker_flag = 1, audio_mode_count = 1 [ 91.032187] [drm] dc_link_detect: mode number = 0, format_code = 1, channel_count = 1, sample_rate = 7, sample_size = 7 [ 91.033565] [drm] {3840x2160, 4000x2222@533250Khz} [ 91.049360] [drm] {3840x2160, 4400x2250@594000Khz} [ 102.531453] input: Surface Mouse as /devices/virtual/misc/uhid/0005:045E:0919.0006/input/input21 [ 102.531670] hid-generic 0005:045E:0919.0006: input,hidraw5: BLUETOOTH HID v1.10 Mouse [Surface Mouse] on 00:1A:7D:DA:71:15 [ 102.537616] mousedev: PS/2 mouse device common for all mice [ 106.699544] fuse init (API version 7.26) [ 106.974151] usb 1-3.3: 1:1: cannot get freq at ep 0x81 [ 107.942126] rfkill: input handler disabled [ 108.240100] ISO 9660 Extensions: RRIP_1991A I don't have a serial dongle here so I can't use a serial console but at one point I ran the console on the integrated intel GPU (I usually have it disabled in BIOS) while using Xorg on the AMD GPU, there was no messages there either. If you come up with any ideas or patches I'm happy to try them out.
Do not use mainline kernels and point releases Mesa. They work randomly as you see because of partially implemented drivers. Use the kernel below and Mesa dev git. Debian testing/sid Xfce is easier and more stable than Arch Linux with never ready buggy desktops kde and gnome. Oibaf Mesa git ppa bionic version is compatible with Debian testing/sid. https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.17-wip No problems with my RX560 with latest code.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/298.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.