Summary: | AMDGPU driver keeps reloading on hybrid graphics system causing stuttering. | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Ransu <gero3977> | ||||||||||||||||||||||||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||||||||||||||||||||
Severity: | critical | ||||||||||||||||||||||||||||
Priority: | highest | CC: | gero3977, mike | ||||||||||||||||||||||||||
Version: | unspecified | ||||||||||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||||
Attachments: |
|
Description
Ransu
2018-09-17 04:52:52 UTC
By stuttering I mean every animation stops for a second or so. This happens every few seconds, it's especially annoying during video playback. The messages appear when the AMD GPU is woken up due to userspace using some functionality of the amdgpu kernel driver. Does amdgpu.runpm=0 on the kernel command line prevent the stuttering? If so, I hope somebody can help you track down how userspace is causing the AMD GPU to wake up. (Note that the above will keep the AMD GPU powered on all the time; it's intended to confirm that the stuttering is related to powering up the AMD GPU, not as a permanent solution) Please attach the Xorg log file and the output of dmesg and xrandr. Created attachment 141597 [details]
Xorg.0.log (log 1)
Created attachment 141598 [details]
xrandr information (log 1)
Created attachment 141599 [details]
dmesg -w (log 1)
Created attachment 141600 [details]
xorg.config (log 1)
All attachments for this comment was marked "log 1" This includes my current xorg.conf At the time of the attachments the kernel command line was as follows with sensitive information left out. BOOT_IMAGE=/vmlinuz-linux root=UUID=<REDACTED> rw cryptdevice=/dev/disk/by-uuid/<REDACTED> radeon.si_support=0 amdgpu.si_support=1 memmap=10M$2245M quiet resume=UUID=<REDACTED> Adding amdgpu.runpm=0 helps big time but I know this means both the AMD and Intel would be running all the time, this is not ideal. As I said before I would like to have the Intel GPU running as my main and only activating the AMD GPU when I want to make use of a better GPU, preferably with PRIME. I'm not seeing stuttering, but I do see AMDGPU loading up each time I play a video on MPV or Chromium The last time I saw this something was using drmGetDevice rather than drmGetDevice2 I did a quick grep on libraries that contain drmGetDevice and drmGetDevice2 and did a diff -Binary file /usr/lib64/libva-drm.so.2.200.0 matches @@ -6 +4,0 @@ -Binary file /usr/lib64/libva-wayland.so.2.200.0 matches @@ -13,3 +10,0 @@ -Binary file /usr/lib64/xorg/modules/drivers/modesetting_drv.so matches -Binary file /usr/lib64/xorg/modules/drivers/amdgpu_drv.so matches -Binary file /usr/lib64/xorg/modules/libglamoregl.so matches My guess they're the most likely candidates for this happening If this is a library file issue how should I go fixing this? Does this need a upstream or mainline fix? I'm not sure, I'm hoping I might be pointing the devs in the right direction I'm not sure if drmGetDeviceNameFromFd vs drmGetDeviceNameFromFd2 could cause the issue too - I think it might I found in the xserver: hw/xfree86/dri2/dri2.c: if (drmGetDevice(info->fd, &dev) || dev->bustype != DRM_BUS_PCI) { hw/xfree86/drivers/modesetting/dri2.c: info.deviceName = drmGetDeviceNameFromFd(ms->fd); drm/xf86drm.c:int drmGetDevices(drmDevicePtr devices[], int max_devices) In the AMDGPU DDX: src/amdgpu_dri2.c: info->dri2.device_name = drmGetDeviceNameFromFd(pAMDGPUEnt->fd); And in libva: va/drm/va_drm_utils.c: name = drmGetDeviceNameFromFd(fd); I notice in the old libva1 code there was no drmGetDevice stuff and it's only in libva2 I could find the above reference Tonight when I have access to my laptop I'll try switching those two the '2' versions and see if it stops the issues, unless anyone else has any better ideas (In reply to Mike Lothian from comment #13) > hw/xfree86/dri2/dri2.c: if (drmGetDevice(info->fd, &dev) || dev->bustype > != DRM_BUS_PCI) { > hw/xfree86/drivers/modesetting/dri2.c: info.deviceName = > drmGetDeviceNameFromFd(ms->fd); > > [...] > > src/amdgpu_dri2.c: info->dri2.device_name = > drmGetDeviceNameFromFd(pAMDGPUEnt->fd); These are only called during X server startup. > va/drm/va_drm_utils.c: name = drmGetDeviceNameFromFd(fd); This should only be called when a video player using VA-API runs standalone, not via X (or Wayland), and even then only once. Try running sudo perf record -e rpm:rpm_resume --call-graph=dwarf in a terminal, then do whatever is needed to reproduce the problem, then interrupt the perf command with Ctrl-C and attach the output of sudo perf report --header Weird it was only showing i915 resumes no amdgpu ones - even though dmesg clearly shows the card powering up Created attachment 141631 [details]
pref report --header
Created attachment 141632 [details]
Perf data
So the --header didn't show anything, however the raw data does seem to do something amdgpu releated
Created attachment 141633 [details]
Report of amdgpu:*
I've repeated but using -e amdgpu:*
Created attachment 141635 [details]
Using libunwind
Comment on attachment 141635 [details]
Using libunwind
Ransu, please try to get the information from your system the same way Mike did. Looks like he's running into a different issue which only happens using the Xorg modesetting driver.
Created attachment 141907 [details]
report with amdgpu DDX
this is still happening with the amdgpu DDX
Created attachment 141908 [details]
report with amdgpu DDX with debugging
Created attachment 141910 [details]
mpv debugging log
Created attachment 141911 [details]
perf report with mpv debugging
Sorry for the lack of updates, life got in the way and then this week I made a stupid mistake on where I was sending data with 'dd'. I didn't lose any important data but it did force me to have to setup my laptop from scratch. Good news! As of kernel 4.18.16 I no longer see an issue, knock on wood. I'm going to give it a week and report back but the AMDGPU driver does not seem to be reloading like mad anymore. Linux Y40-80 4.18.16-arch1-1-ARCH #1 SMP PREEMPT Sat Oct 20 22:06:45 UTC 2018 x86_64 GNU/Linux I get the following two messages whenever I request the AMD GPU with "PRIME_DRI=1" and only when I request the AMD GPU. When I'm not making use of the discrete graphics and only the dedicated Intel of my laptop I do not see the below two messages repeated over and over anymore, nor do I see any stuttering. > [drm] PCIE gen 2 link speeds already enabled > amdgpu 0000:05:00.0: PCIE GART of 1024M enabled (table at 0x000000F400000000). One other thing I should note before I set up the system with AMDGPU by putting the following kernel command line arguments into my grub config "radeon.si_support=0 amdgpu.si_support=1" radeon alone would crash my system. I needed to go into a different TTY before login into XFCE to get things setup following this page https://wiki.archlinux.org/index.php/AMDGPU >Nov 08 00:56:06 Y40-80 kernel: radeon 0000:05:00.0: fence driver on ring 4 use >gpu addr 0x0000000080000c10 and cpu addr 0x000000006a0bf82f >Nov 08 00:56:06 Y40-80 kernel: radeon 0000:05:00.0: fence driver on ring 5 use >gpu addr 0x0000000000075a18 and cpu addr 0x000000003d1f62f3 >Nov 08 00:56:06 Y40-80 kernel: radeon 0000:05:00.0: failed VCE resume (-22). >Nov 08 00:56:07 Y40-80 kernel: [drm:r600_ring_test [radeon]] *ERROR* radeon: >ring 0 test failed (scratch(0x850C)=0xCAFEDEAD) >Nov 08 00:56:07 Y40-80 kernel: [drm:si_resume [radeon]] *ERROR* si startup >failed on resume >Nov 08 00:56:22 Y40-80 kernel: [drm:atom_op_jump [radeon]] *ERROR* atombios >stuck in loop for more than 5secs aborting >Nov 08 00:56:22 Y40-80 kernel: [drm:atom_execute_table_locked [radeon]] *ERROR* >atombios stuck executing C078 (len 237, WS 0, PS 4) @ 0xC086 >Nov 08 00:56:22 Y40-80 kernel: [drm:atom_execute_table_locked [radeon]] *ERROR* >atombios stuck executing B99E (len 78, WS 12, PS 8) @ 0xB9D7 So now that things appear to be working I just have a few more questions. Does this mean that the discrete GPU should be making use of power saving features and shouldn't be draining too much power if I'm not making use of it? and Does my card support AMDGPU-PRO drivers? If so is there any real advantage of using the "PRO" extras over the standard open source driver? Oh I also did add the kernel modules as follows to my mkinitcpio configuration in case that helped any, first three are for the two GPU my laptop has and the rest are for the encrypted disk. MODULES="i915 amdgpu radeon dm_mod dm_crypt ext4 aes_x86_64 sha256 sha512" (In reply to Ransu from comment #26) > > Does my card support AMDGPU-PRO drivers? If so is there any real advantage > of using the "PRO" extras over the standard open source driver? You only need the "PRO" driver if you need OpenGL that is certified for workstation applications or OpenCL. (In reply to Mike Lothian from comment #22) > Created attachment 141907 [details] > report with amdgpu DDX > > this is still happening with the amdgpu DDX Are you still having issues? I'm closing this as fixed. The latest code is working for me. I have now upgraded to kernel 4.19.2 in Arch Linux and things appear to continue to work as expected. Linux Y40-80 4.19.2-arch1-1-ARCH #1 SMP PREEMPT Tue Nov 13 21:16:19 UTC 2018 x86_64 GNU/Linux |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.