Bug 102800

Summary: DRI_PRIME regression- radeon: Failed to allocate virtual address for buffer
Product: DRI Reporter: higuita
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: bizyaev, samuel
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg on boot
none
dmesg after turning off and on the radeon card
none
Xorg.0.log after the Off and On
none
workaround to test
none
workaround to test
none
Dmesg with patch none

Description higuita 2017-09-15 20:01:29 UTC
Using a ubuntu 17.04 with a kernel 4.13.2, mesa 1.3-git and libdrm 2.4.83 on a lenovo thinkpad S440 with a intel haswell and a radeon HD8670M/8690M

doing this commands i get a error:

+ xrandr --setprovideroffloadsink 0x3f 0x66
+ DRI_PRIME=1
+ glxgears
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x800000
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x800000
radeonsi: Failed to create a context.
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x800000
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x800000
radeonsi: Failed to create a context.
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  155 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  31
  Current serial number in output stream:  33

In the dmesg, i can see this:


[ 1059.004670] [drm:atom_op_jump [radeon]] *ERROR* atombios stuck in loop for more than 5secs aborting
[ 1059.004693] [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing 746C (len 237, WS 0, PS 4) @ 0x747A
[ 1059.004703] [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing 6E04 (len 74, WS 0, PS 8) @ 0x6E39
[ 1059.012106] [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[ 1059.012110] [drm] PCIE gen 2 link speeds already enabled
[ 1059.448659] [UFW ALLOW] IN= OUT=wlan0 SRC=10.42.42.80 DST=140.172.138.79 LEN=40 TOS=0x00 PREC=0x00 TTL=64 ID=44245 DF PROTO=TCP SPT=50408 DPT=80 WINDOW=30498 RES=0x00 ACK FIN URGP=0 
[ 1059.476638] radeon 0000:06:00.0: Wait for MC idle timedout !
[ 1059.708499] radeon 0000:06:00.0: Wait for MC idle timedout !
[ 1059.714600] [drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
[ 1059.714727] radeon 0000:06:00.0: WB enabled
[ 1059.714730] radeon 0000:06:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8c2c59b12c00
[ 1059.714731] radeon 0000:06:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8c2c59b12c04
[ 1059.714732] radeon 0000:06:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8c2c59b12c08
[ 1059.714732] radeon 0000:06:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8c2c59b12c0c
[ 1059.714733] radeon 0000:06:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8c2c59b12c10
[ 1060.424258] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
[ 1060.424283] [drm:si_resume [radeon]] *ERROR* si startup failed on resume

All this setup worked fine in a previous kernel versions, IIRC, 4.11 and below and started to fail in 4.12 and above

I also notice that the dedicated card switch fron DynOff to DynPwr in /sys/kernel/debug/vgaswitcheroo/switch for a few seconds when trying to run the glxgears


Finally, If i boot the system with radeon.runpm=1, it works

Let me know if you need more logs
Comment 1 Alex Deucher 2017-09-15 20:08:37 UTC
(In reply to higuita from comment #0)
> 
> 
> Finally, If i boot the system with radeon.runpm=1, it works
> 
> Let me know if you need more logs

runpm is enabled by default so specifying it shouldn't change anything.  Please attach your xorg log and dmesg output.
Comment 2 higuita 2017-09-15 20:16:00 UTC
Sorry, radeon.runpm=0 does not work too, the card was powered off, so it was using Intel card

Enabling the dedicated card still fails with the same error.

the dmesg show this when enabling the card:

[  180.869486] radeon: switched on
[  180.898877] [drm] probing gen 2 caps for device 8086:9c18 = 5323c42/0
[  180.898881] [drm] PCIE gen 2 link speeds already enabled
[  180.905022] [drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
[  180.905150] radeon 0000:06:00.0: WB enabled
[  180.905154] radeon 0000:06:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff891099a99c00
[  180.905155] radeon 0000:06:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff891099a99c04
[  180.905157] radeon 0000:06:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff891099a99c08
[  180.905158] radeon 0000:06:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff891099a99c0c
[  180.905159] radeon 0000:06:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff891099a99c10
[  181.613461] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
[  181.613478] [drm:si_resume [radeon]] *ERROR* si startup failed on resume
Comment 3 higuita 2017-09-15 20:29:04 UTC
Created attachment 134265 [details]
dmesg on boot

So sorry again :)

runpn=0 do work, but after doing "echo OFF > /sys/kernel/debug/vgaswitcheroo/switch" , it stop working

So attached adding the dmesg on boot and a dmesg after turning OFF and ON again the card. Also the Xorg.0.log, but it didn't output anything useful (some EDI modeset lines)
Comment 4 higuita 2017-09-15 20:30:09 UTC
Created attachment 134266 [details]
dmesg after turning off and on the radeon card
Comment 5 higuita 2017-09-15 20:30:58 UTC
Created attachment 134267 [details]
Xorg.0.log after the Off and On
Comment 6 Michel Dänzer 2017-10-04 15:58:13 UTC
(In reply to higuita from comment #0)
> All this setup worked fine in a previous kernel versions, IIRC, 4.11 and
> below and started to fail in 4.12 and above

Any chance you can bisect between 4.11 and 4.12?
Comment 7 higuita 2017-10-09 23:18:48 UTC
I tried to downgrade the kernel to 4.11, 4.10 and 4.9 and i failed to enable DRI_PRIME on the dedicated card, always with the same error. I then tried to downgrade the mesa and libdrm to the distro original version and also, failed to enabled the card...

So yes, the card worked in the past with DynPwr, but now i'm unable to enable it... Maybe some other update, firmware, whatever is also messing this. I do recall being able to boot one older kernel and have the card working, but booting with the more recent, it failed... all this around june

I will try to boot from a liveCD, to try to get it working again and try to understand what else is causing this.

Anyway, using runpm=0 works and the problems seems to be always with the kernel/atombios being unable to re-enable the card after shut it down
Comment 8 Alex Deucher 2017-10-10 13:25:07 UTC
Created attachment 134779 [details] [review]
workaround to test

Does this patch fix the issue?
Comment 9 higuita 2017-10-10 23:50:05 UTC
Yes, applying the patch, i can use the radeon card without the runpm=0. Also, turning OFF and ON the dedicated GPU now works fine
Comment 10 higuita 2017-11-06 03:38:03 UTC
So will kernel 4.14 will bring any fix to this? or you need more info about my system?
Comment 11 higuita 2017-11-13 20:12:17 UTC
just tested kernel 4.14.0 (without the above patch) and it still fails

i will apply the patch again, as a workaround
Comment 12 Alex Deucher 2017-11-13 21:02:27 UTC
Can you verify that the Linux pci subsystem is properly calling the ACPI _PR3 method on your platform?  The workaround just uses the legacy AMD ACPI interface.  It appears pci is not calling _PR3 properly for some reason on your platform.
Comment 13 higuita 2017-11-17 02:04:20 UTC
Can you give me any pointer how to "call the ACPI _PR3 method"?

i already install acpi_call, but have no idea how to use it
Comment 14 Alex Deucher 2017-11-21 23:40:02 UTC
(In reply to higuita from comment #13)
> Can you give me any pointer how to "call the ACPI _PR3 method"?
> 
> i already install acpi_call, but have no idea how to use it

The pci core should be doing it for you since runtime pm support was added to pcie ports:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9d26d3a8f1b0c442339a235f9508bdad8af91043
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=006d44e49a259b39947366728d65a873a19aadc0

Does adding:
pcie_port_pm=force
to the kernel command line in grub help?
Comment 15 higuita 2017-11-24 22:22:56 UTC
no, the with the pcie_port_pm=force and 4.14.0, it still fails
Comment 16 Alex Deucher 2017-12-19 04:16:16 UTC
Created attachment 136269 [details] [review]
workaround to test

To narrow things down further, does this patch also work?
Comment 17 higuita 2018-01-11 23:14:45 UTC
Sorry for the delay!

yes, it works! After applying the patch, i can use the radeon card without the runpm=0 normally
Comment 18 Alex Deucher 2018-01-12 16:22:07 UTC
Can you attach your dmesg output?
Comment 19 higuita 2018-01-17 18:21:20 UTC
Created attachment 136810 [details]
Dmesg with patch

Here is the dmesg with the patch applied
Comment 20 Martin Peres 2019-11-19 09:30:54 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/821.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.