Bug 87649

Summary: After archlinux system update switching on both of internal and external graphics card does not work anymore (vgaswitcheroo)
Product: DRI Reporter: gofabian
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: medium CC: intel-gfx-bugs
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output with error logging
none
dmesg (reproduced issue)
none
/sys/class/drm/card1/error (reproduced issue)
none
dmesg v3.19rc2
none
dmesg_v3.19rc2_without_switch_refuse none

Description gofabian 2014-12-23 20:11:24 UTC
Created attachment 111236 [details]
dmesg output with error logging

Hello,

after I updated archlinux with 'pacman -Syu' I cannot switch on both graphics cards (amd+intel) concurrently anymore via vgaswitcheroo. I need the parallel activation because Intel feeds the notebook screen and amd feeds my external screen. 

Default graphics card is the Intel one. This configuration is working well (only notebook screen). I have another grub boot option that switches on the dedicated graphics card before the display manager is started:

echo ON > /sys/kernel/debug/vgaswitcheroo/switch
echo DIS > /sys/kernel/debug/vgaswitcheroo/switch

When I add a further 'echo OFF > ...' command, the Intel graphics card will not be active and the error will not happen. But in this case I do not have an output on my notebook screen.

I had a look at the dmesg output and it told me to create a bug ticket ;-)

Error: [drm] GPU HANG: ecode 0:0x85ffaafc, in Xorg.bin [1080], reason: Ring hung, action: reset

Notebook: HP Envy 14-1101eg

Best regards
Fabian
Comment 1 Jani Nikula 2014-12-29 08:16:47 UTC
(In reply to fabian.ifflaender from comment #0)
> I had a look at the dmesg output and it told me to create a bug ticket ;-)

It also told you to attach the gpu crash dump! ;)

[  176.447822] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  176.447828] [drm] GPU crash dump saved to /sys/class/drm/card1/error
Comment 2 gofabian 2015-01-03 20:10:18 UTC
Created attachment 111697 [details]
dmesg (reproduced issue)
Comment 3 gofabian 2015-01-03 20:11:17 UTC
Created attachment 111698 [details]
/sys/class/drm/card1/error (reproduced issue)
Comment 4 gofabian 2015-01-03 20:12:02 UTC
Hello,

I needed some time to reproduce the issue. My description was not completely correct.

Steps to reproduce:


(1) Use grub boot option to switch hardware multiplexer to discrete card before start of window manager:


echo ON > /sys/kernel/debug/vgaswitcheroo/switch
echo DIS > /sys/kernel/debug/vgaswitcheroo/switch



(2) Toggle integrated card


echo ON > /sys/kernel/debug/vgaswitcheroo/switch
-> dmesg output:
[   14.326076] i915: switched on


echo OFF > /sys/kernel/debug/vgaswitcheroo/switch
-> dmesg output:
[  119.977018] i915: switched off


echo ON > /sys/kernel/debug/vgaswitcheroo/switch
-> dmesg output:
[  119.977018] i915: switched off
[  120.261682] ------------[ cut here ]------------
[  120.261695] WARNING: CPU: 2 PID: 1130 at drivers/pci/pci.c:1535 pci_disable_device+0x99/0xb0()
[  120.261698] i915 0000:00:02.0: disabling already-disabled device
[  120.261700] Modules linked in:
[  120.261703]  fuse ctr ccm hp_wmi sparse_keymap arc4 brcmsmac cordic brcmutil b43 mac80211 cfg80211 ssb mmc_core rng_core pcmcia pcmcia_core iTCO_wdt iTCO_vendor_support ecb btusb coretemp intel_powerclamp bluetooth kvm_intel rfkill kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev media mousedev joydev psmouse serio_raw r8169 i2c_i801 bcma snd_hda_codec_idt intel_ips mii lpc_ich snd_hda_codec_generic snd_hda_codec_hdmi wmi fan snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm hp_accel snd_timer led_class lis3lv02d snd input_polldev battery ac thermal evdev mac_hid mei_me intel_agp mei acpi_cpufreq shpchp processor soundcore
[  120.261774]  sch_fq_codel ext4 crc16 mbcache jbd2 hid_logitech_dj usbhid hid sd_mod sr_mod crc_t10dif crct10dif_common cdrom atkbd libps2 ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common i8042 serio i915 button intel_gtt video radeon hwmon i2c_algo_bit drm_kms_helper ttm drm i2c_core
[  120.261808] CPU: 2 PID: 1130 Comm: bash Not tainted 3.17.6-1-ARCH #1
[  120.261811] Hardware name: Hewlett-Packard HP ENVY 14 Notebook PC          /1436, BIOS F.23 11/11/2010
[  120.261813]  0000000000000000 000000000bb54064 ffff88008639fd60 ffffffff81537c3e
[  120.261817]  ffff88008639fda8 ffff88008639fd98 ffffffff8107079d ffff880151852000
[  120.261820]  ffff880151b4f1b0 0000000000000002 ffff88008639ff48 0000000000000000
[  120.261824] Call Trace:
[  120.261831]  [<ffffffff81537c3e>] dump_stack+0x4d/0x6f
[  120.261838]  [<ffffffff8107079d>] warn_slowpath_common+0x7d/0xa0
[  120.261842]  [<ffffffff8107081c>] warn_slowpath_fmt+0x5c/0x80
[  120.261869]  [<ffffffffa028371a>] ? intel_display_set_init_power+0x2a/0x50 [i915]
[  120.261876]  [<ffffffff812e4be9>] pci_disable_device+0x99/0xb0
[  120.261889]  [<ffffffffa022ac3c>] i915_suspend+0x5c/0xb0 [i915]
[  120.261906]  [<ffffffffa02bc96b>] i915_switcheroo_set_state+0x3b/0x90 [i915]
[  120.261911]  [<ffffffff813a51d8>] vga_switchoff.part.2+0x18/0x40
[  120.261916]  [<ffffffff813a5883>] vga_switcheroo_debugfs_write+0x303/0x3b0
[  120.261921]  [<ffffffff811ca1e8>] ? __sb_start_write+0x58/0x110
[  120.261927]  [<ffffffff812652a3>] ? security_file_permission+0x23/0xa0
[  120.261931]  [<ffffffff811c7967>] vfs_write+0xb7/0x200
[  120.261934]  [<ffffffff811c85d9>] SyS_write+0x59/0xd0
[  120.261939]  [<ffffffff8153dc69>] system_call_fastpath+0x16/0x1b
[  120.261942] ---[ end trace 63b5ef02d0e89d31 ]---


echo ON > /sys/kernel/debug/vgaswitcheroo/switch
-> dmesg output:
[  139.628835] i915: switched on



(3) Prepare switch to integrated card


echo DIGD > /sys/kernel/debug/vgaswitcheroo/switch
-> dmesg output:
[  146.237656] vga_switcheroo: client 1 refused switch
[  146.237662] vga_switcheroo: setting delayed switch to client 0



(4) Restart X (Strg + Alt + Del)

-> dmesg output:
[  172.751769] vga_switcheroo: processing delayed switch to 0
[  172.751784] snd_hda_intel 0000:01:00.1: Disabling via VGA-switcheroo
[  173.134994] fbcon: Remapping primary device, fb0, to tty 1-63
[  180.531979] [drm] stuck on render ring
[  180.535219] [drm] GPU HANG: ecode 0:0x85ffaafc, in Xorg.bin [1246], reason: Ring hung, action: reset
[  180.535309] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  180.535347] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  180.535349] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  180.535351] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  180.535367] [drm] GPU crash dump saved to /sys/class/drm/card1/error
[  180.535390] [drm:intel_pipe_set_base] *ERROR* pin & fence failed
[  181.042044] [drm:i915_reset] *ERROR* Failed to reset chip: -110


I attached the logfiles.

Thanks for your support!
Fabian
Comment 5 Imre Deak 2015-01-07 15:16:58 UTC
In the dmesg I can also see

[    8.223153] i915: switched off

Looking at the 3.17.6 kernel it seems we won't re-enable the PCI device afterwards in response to a switcheroo ON command. Could you try v3.19-rc2 or git://anongit.freedesktop.org/drm-intel ? There is a related i915 switcheroo fix in those that should address this issue.
Comment 6 gofabian 2015-01-09 21:39:46 UTC
I tried the kernel version 3.19.0-031900rc2-generic. With this kernel I cannot switch the hardware multiplexer via vga_switcheroo to the external card at all.

(A further difference is that the "Reverse PRIME" feature is more stable now.)

I attached the dmesg output.
Comment 7 gofabian 2015-01-09 21:40:35 UTC
Created attachment 112034 [details]
dmesg v3.19rc2
Comment 8 Imre Deak 2015-01-09 22:35:44 UTC
(In reply to gofabian from comment #7)
> Created attachment 112034 [details]
> dmesg v3.19rc2

Based on the "vga_switcheroo: client x refused switch" message I assume some user space app keep the relevant device nodes open and switching isn't possible at that time. Did you make sure all interesting apps are stopped (desktop, pulse audio etc.) before switching?
Comment 9 gofabian 2015-01-10 09:36:14 UTC
I tried a delayed switch while the window manager is active:

echo DDIS > /sys/kernel/debug/vgaswitcheroo/switch

result:
[  143.760681] vga_switcheroo: client 1 refused switch
[  143.760687] vga_switcheroo: setting delayed switch to client 1

Afterwards I restart the window manager (logout, login)

result:
[  165.888245] vga_switcheroo: processing delayed switch to 1
[  165.888254] vga_switcheroo: client 1 refused switch
[  165.912704] vga_switcheroo: processing delayed switch to 1
[  165.912708] vga_switcheroo: client 101 refused switch
[  166.955699] vga_switcheroo: processing delayed switch to 1
[  166.955706] vga_switcheroo: client 0 refused switch
[  166.955728] vga_switcheroo: processing delayed switch to 1
[  166.955730] vga_switcheroo: client 0 refused switch
[  166.956444] vga_switcheroo: processing delayed switch to 1
[  166.956449] vga_switcheroo: client 0 refused switch

Are there any further steps I should do besides restarting the window manager? I thought restarting the wm is sufficient.

Do you think the PRIME feature may keep the device nodes open?
Comment 10 gofabian 2015-01-10 13:59:27 UTC
(In reply to Imre Deak from comment #8)
> Based on the "vga_switcheroo: client x refused switch" message I assume some
> user space app keep the relevant device nodes open and switching isn't
> possible at that time. Did you make sure all interesting apps are stopped
> (desktop, pulse audio etc.) before switching?

You were right. The process pulseaudio still ran. I tried this:

- Stop mdm

- killall pulseaudio

- echo DDIS > /sys/kernel/debug/vgaswitcheroo/switch

- Start mdm

i915 does not refuse to switch. BUT: The screens stay black. The dmesg output says that the radeon driver had some errors.
Comment 11 gofabian 2015-01-10 14:00:16 UTC
Created attachment 112056 [details]
dmesg_v3.19rc2_without_switch_refuse
Comment 12 Imre Deak 2015-01-12 22:38:37 UTC
(In reply to gofabian from comment #11)
> Created attachment 112056 [details]
> dmesg_v3.19rc2_without_switch_refuse

Looks like a raedon issue. Could you reassign this bug to the raedon people so they can have a look?
Comment 13 Alex Deucher 2015-06-04 13:21:33 UTC
(In reply to gofabian from comment #11)
> Created attachment 112056 [details]
> dmesg_v3.19rc2_without_switch_refuse

It's a radeon GPU hang.  Is there something specific that you were doing to trigger it?  E.g., running some app?
Comment 14 Martin Peres 2019-11-19 08:59:33 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/565.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.