Bug 81066 - [r600g] Second Life causes GPU to lock up sometimes with DRI_PRIME
Summary: [r600g] Second Life causes GPU to lock up sometimes with DRI_PRIME
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-08 20:55 UTC by Shawn Starr
Modified: 2014-11-10 05:10 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Kernel stack dump (82.36 KB, text/plain)
2014-07-08 20:55 UTC, Shawn Starr
no flags Details

Description Shawn Starr 2014-07-08 20:55:16 UTC
Created attachment 102452 [details]
Kernel stack dump

kernel: 3.16.0-0.rc4.git0.1.fc21.1.x86_64
mesa: 0.3.0-2.20140707.fc21.x86_64 (git master)
Xorg: 1.15.99.904-1.fc21.x86_64 (git master, no 904 build yet)

kernel command boot options: BOOT_IMAGE=/vmlinuz-3.16.0-0.rc4.git0.1.fc21.1.x86_64 root=/dev/mapper/fedora_segfault-root ro vconsole.keymap=us rhgb slub_debug=- cgroup_disable=memory console=tty0 console=ttyS0,115200n8 radeon.runpm=0 radeon.dpm=1 radeon.hard_reset=1 intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 LANG=en_CA.UTF-8

Kernel dump from lockup see attachment

1 [review]) I am trying to use the PCI hard_reset radeon.ko option
2) pciehp decides to yank the GPU driver while it's doing a PCI reset (!)
3) When the GPU is reset, it's in a non-finished startup state.

This all happened when I tried to play Second Life with the Radeon GPU as offload renderer on the Intel GM45 GPU which is doing the display.

To run Second Life I'm using:

$ LIBGL_DRI3_DISABLE=1
$ xrandr --setprovideroffloadsink 0x55 0xa2
$DRI_PRIME=1 ./singularity >& /dev/null &

Here is the Xrandr providers list:

Providers: number : 3
Provider 0: id: 0xa2 cap: 0xb, Source Output, Sink Output, Sink Offload crtcs: 3 outputs: 4 associated providers: 2 name:Intel
Provider 1: id: 0x55 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 2 outputs: 4 associated providers: 2 name:radeon
Provider 2: id: 0x55 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 2 outputs: 4 associated providers: 2 name:radeon
Comment 1 Shawn Starr 2014-07-09 14:09:11 UTC
Some additional information, when playing Second Life full windowed (not fullscreen) restoring the window size to previous shows several DMAR faults to the Intel GM45 GPU.

DMAR:[fault reason 05] PTE Write access is not set
[ 1141.208920] dmar: DRHD: handling fault status reg 3
[ 1141.209061] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e76a3000
DMAR:[fault reason 05] PTE Write access is not set
[ 1141.209456] dmar: DRHD: handling fault status reg 3
[ 1141.209617] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e76f2000
DMAR:[fault reason 05] PTE Write access is not set
[ 1141.210009] dmar: DRHD: handling fault status reg 3
[ 1141.210159] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e7745000
DMAR:[fault reason 05] PTE Write access is not set
[ 1141.210547] dmar: DRHD: handling fault status reg 3
[ 1141.210697] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e779a000
DMAR:[fault reason 05] PTE Write access is not set
[ 1141.211093] dmar: DRHD: handling fault status reg 3
[ 1141.211241] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e77e2000
DMAR:[fault reason 05] PTE Write access is not set

Having disabled PCIe Hotplug driver this is how the GPU resets now (I had chrome running with GPU acceleration but it failed to render to screen when I minimized/restored it)

[ 1369.585123] radeon 0000:01:00.0: ring 0 stalled for more than 10266msec
[ 1369.585424] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000005158f last fence id 0x000000000005158d on ring 0)
[ 1369.585943] radeon 0000:01:00.0: failed to get a new IB (-35)
[ 1369.586232] [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib !
[ 1369.806564] radeon 0000:01:00.0: Saved 9945 dwords of commands on ring 0.
[ 1369.806764] radeon 0000:01:00.0: GPU softreset: 0x00000008
[ 1369.806919] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[ 1369.807139] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[ 1369.807408] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
[ 1369.807633] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 1369.807843] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[ 1369.808049] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000006
[ 1369.808278] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80000645
[ 1369.808481] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[ 1369.872186] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00004001
[ 1369.872523] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[ 1369.874896] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[ 1369.875215] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[ 1369.875519] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
[ 1369.875818] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[ 1369.876138] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[ 1369.876436] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[ 1369.876736] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
[ 1369.877058] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[ 1369.877365] radeon 0000:01:00.0: GPU pci config reset
[ 1370.000857] Watchdog[2701]: segfault at 0 ip 00007fc671897b1e sp 00007fc6590d9670 error 6 in chrome[7fc66d751000+5477000]

[ 1370.128911] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[ 1370.260939] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[ 1370.977120] radeon 0000:01:00.0: Wait for MC idle timedout !
[ 1371.156252] radeon 0000:01:00.0: Wait for MC idle timedout !
[ 1371.159481] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[ 1371.159845] divide error: 0000 [#1] SMP
[ 1371.160008] Modules linked in: vhost_net vhost macvtap macvlan tun bridge stp llc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev media snd_usb_audio snd_usbmidi_lib snd_rawmidi sdhci_pci arc4 iwldvm sdhci iTCO_wdt iTCO_vendor_support coretemp mac80211 mmc_core snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel kvm_intel kvm r592 memstick snd_hda_controller microcode i2c_i801 thinkpad_acpi lpc_ich mfd_core snd_hda_codec snd_hwdep tpm_tis snd_seq iwlwifi snd_seq_device snd_pcm snd_timer snd tpm cfg80211 soundcore rfkill mei_me shpchp wmi mei acpi_cpufreq sunrpc binfmt_misc i915 radeon e1000e i2c_algo_bit drm_kms_helper ttm drm ptp pps_core i2c_core video
[ 1371.160008] CPU: 0 PID: 983 Comm: Xorg.bin Tainted: G        W     3.16.0-0.rc4.git0.1.fc21.1.x86_64 #1
[ 1371.160008] Hardware name: LENOVO 4058CTO/4058CTO, BIOS 6FET93WW (3.23 ) 10/12/2012
[ 1371.160008] task: ffff88024f6913c0 ti: ffff880250f7c000 task.ti: ffff880250f7c000
[ 1371.160008] RIP: 0010:[<ffffffffa01973ca>]  [<ffffffffa01973ca>] r6xx_remap_render_backend+0x6a/0xe0 [radeon]
[ 1371.160008] RSP: 0018:ffff880250f7fa88  EFLAGS: 00010246
[ 1371.160008] RAX: 0000000000000002 RBX: 00000000ffffffff RCX: 0000000000000002
[ 1371.160008] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000002
[ 1371.160008] RBP: ffff880250f7fac0 R08: 00000000000000ff R09: 0000000000000692
[ 1371.160008] R10: 0000000000002000 R11: 2e29303030303430 R12: 0000000080000000
[ 1371.160008] R13: 00000000000000ff R14: 0000000000000000 R15: 0000000000000000
[ 1371.160008] FS:  00007f08a40cf9c0(0000) GS:ffff88025bc00000(0000) knlGS:0000000000000000
[ 1371.160008] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1371.160008] CR2: 00007f6a7aca7000 CR3: 000000024f7a2000 CR4: 00000000000427f0
[ 1371.160008] Stack:
[ 1371.160008]  ffff88009d104000 0000000200000200 ffff88009d104000 000000000000c352
[ 1371.160008]  00000000ffffffff 000000000000cb52 0000000000ffff00 ffff880250f7fb18
[ 1371.160008]  ffffffffa019a52c 0000000000000000 ffffffff00000000 00000000ffffffff
[ 1371.160008] Call Trace:
[ 1371.160008]  [<ffffffffa019a52c>] r600_startup+0x85c/0x16e0 [radeon]
[ 1371.160008]  [<ffffffffa019b3e3>] r600_resume+0x33/0x70 [radeon]
[ 1371.160008]  [<ffffffffa014b7f1>] radeon_gpu_reset+0x131/0x2c0 [radeon]
[ 1371.160008]  [<ffffffffa017e66f>] radeon_cs_ioctl+0x2ef/0x720 [radeon]
[ 1371.160008]  [<ffffffffa002aa9f>] drm_ioctl+0x1df/0x680 [drm]
[ 1371.160008]  [<ffffffff8105be2c>] ? __do_page_fault+0x29c/0x580
[ 1371.160008]  [<ffffffffa014904c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
[ 1371.160008]  [<ffffffff81211d90>] do_vfs_ioctl+0x2d0/0x4b0
[ 1371.160008]  [<ffffffff81211ff1>] SyS_ioctl+0x81/0xa0
[ 1371.160008]  [<ffffffff8171eca9>] system_call_fastpath+0x16/0x1b
[ 1371.160008] Code: b6 ed 45 09 c5 41 80 fd ff 45 0f 44 e8 d3 e7 89 7d d4 44 89 ef e8 97 ff ff ff 8b 4d d4 41 29 c7 44 39 f9 72 6c 89 c8 31 d2 89 cf <41> f7 f7 44 0f af f8 89 c6 48 8b 45 c8 44 29 ff 83 b8 70 01 00
[ 1371.160008] RIP  [<ffffffffa01973ca>] r6xx_remap_render_backend+0x6a/0xe0 [radeon]
[ 1371.160008]  RSP <ffff880250f7fa88>
[ 1371.405115] ---[ end trace e07978cfec9678e4 ]---

We go no further, X is locked up, and I have to do a cold poweroff since systemd doesn't shut it down ever cleanly from ssh console.
Comment 2 Shawn Starr 2014-11-10 05:10:38 UTC
This was fixed with the Kernel hotplug changes


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.