Summary: | Kernel invalid opcode on unbinding amdgpu | ||
---|---|---|---|
Product: | DRI | Reporter: | nospam |
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | e.yunak, jimijames.bove |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
nospam
2017-03-26 03:03:51 UTC
Me and many other people have been having this issue as well, and I only recently learned that freedesktop.org, NOT kernel.org, is the proper place to report it. Here's my bug report that's been getting ignored for almost a year and hopefully has extra information: https://bugzilla.kernel.org/show_bug.cgi?id=150731 I can confirm that the OS completely hangs when unbinding R9 380 (Tonga Pro) with X running. Works fine with X off. Thought I'd add my post from the linked thread, so I can be updated. ------------------ I have amdgpu and vfio-pci both in kernel, used the following to unbind it. #!/bin/bash for dev in "$@"; do vendor=$(cat /sys/bus/pci/devices/$dev/vendor) device=$(cat /sys/bus/pci/devices/$dev/device) if [ -e /sys/bus/pci/devices/$dev/driver ]; then echo $dev > /sys/bus/pci/devices/$dev/driver/unbind fi echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id done lspci -nnk shows: 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga PRO [Radeon R9 285/380] [1002:6939] (rev f1) Subsystem: PC Partner Limited / Sapphire Technology Radeon R9 380 Nitro 4G D5 [174b:e308] Kernel driver in use: vfio-pci 03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga HDMI Audio [Radeon R9 285/380] [1002:aad8] Subsystem: PC Partner Limited / Sapphire Technology Radeon R9 285/380 HDMI Audio [174b:aad8] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel FWIW, I don't think unbinding is supposed to be possible while Xorg (or anything else) is using the GPU. Sounds like there's something missing somewhere to prevent that. (In reply to Michel Dänzer from comment #3) > FWIW, I don't think unbinding is supposed to be possible while Xorg (or > anything else) is using the GPU. Sounds like there's something missing > somewhere to prevent that. Before I switched to AMD, I was passing an NVidia GPU (GTX 660) into my virtual machine, and I could unbind and rebind it between nouveau and vfio-pci as much as I wanted. No trouble at all. Even while X was running, once DRI3 support came. I switched to AMD expecting the same functionality. Thankfully, not having said functionality isn't the end of the world, but having to reboot my computer every time I want to play a game in Windows right after playing a game in Linux is exactly the kind of pain that I spent a summer setting up the VM to avoid. (In reply to jimijames.bove from comment #4) > Before I switched to AMD, I was passing an NVidia GPU (GTX 660) into my > virtual machine, and I could unbind and rebind it between nouveau and > vfio-pci as much as I wanted. No trouble at all. Even while X was running, > once DRI3 support came. You can still do that with DRI3, you just have to prevent Xorg from using the secondary GPU, e.g. via Section "ServerFlags" Option "AutoAddGPU" "off" EndSection in /etc/X11/xorg.conf. (In reply to Michel Dänzer from comment #5) > (In reply to jimijames.bove from comment #4) > > Before I switched to AMD, I was passing an NVidia GPU (GTX 660) into my > > virtual machine, and I could unbind and rebind it between nouveau and > > vfio-pci as much as I wanted. No trouble at all. Even while X was running, > > once DRI3 support came. > > You can still do that with DRI3, you just have to prevent Xorg from using > the secondary GPU, e.g. via > > Section "ServerFlags" > Option "AutoAddGPU" "off" > EndSection > > in /etc/X11/xorg.conf. Well, sort of. That option is what allows me to bind the card to amdgpu without X crashing (even though I've been told in the past that I shouldn't need that option for that functionality), but this bug--not being able to UNbind it from amdgpu--does not go away with that option. (In reply to jimijames.bove from comment #6) > Well, sort of. That option is what allows me to bind the card to amdgpu > without X crashing (even though I've been told in the past that I shouldn't > need that option for that functionality), but this bug--not being able to > UNbind it from amdgpu--does not go away with that option. Actually, sorry, I just remembered, I *don't* need that option anymore to bind it to amdgpu while X is running. That did get fixed. But back then and also right now, it still doesn't fix this bug. Make sure nothing else (e.g. gdm in Wayland mode) is using the GPU you're trying to unbind either. Something like sudo lsof /dev/dri/* shows which process is using which GPU device(s). (In reply to Michel Dänzer from comment #8) > Make sure nothing else (e.g. gdm in Wayland mode) is using the GPU you're > trying to unbind either. Something like > > sudo lsof /dev/dri/* > > shows which process is using which GPU device(s). I did that way back when I first discovered this bug. I'll do it again just to make sure when I'm back with the computer that has the virtual machine and AMD GPU in a couple weeks. OK, that's weird. Running sudo lsof /dev/dri/* doesn't get me any info about the AMD card at all. I ran it at boot, when it's bound to vfio-pci (I set it up to be that way at boot), then I ran it after unbinding it to that and binding it to amdgpu, and then I ran it after attempting (and failing due to this bug) to unbind it from amdgpu. All 3 times, I got these lines, which are referring to my NVidia GT 740 (card0), and absolutely no lines that have anything to do with Xorg or any other video card: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME Xorg 597 root mem CHR 226,0 14346 /dev/dri/card0 Xorg 597 root 14u CHR 226,0 0t0 14346 /dev/dri/card0 Xorg 597 root 16u CHR 226,0 0t0 14346 /dev/dri/card0 Xorg 597 root 17u CHR 226,0 0t0 14346 /dev/dri/card0 -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/149. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.