Bug 98481 - [Hawaii] Radeon kernel oops with vfio-pci
Summary: [Hawaii] Radeon kernel oops with vfio-pci
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: XOrg git
Hardware: PowerPC All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-28 18:54 UTC by Timothy Pearson
Modified: 2019-11-19 09:19 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Fix radeon kernel oops when used in vfio-based guest (444 bytes, patch)
2016-10-28 22:05 UTC, Timothy Pearson
no flags Details | Splinter Review

Description Timothy Pearson 2016-10-28 18:54:03 UTC
When DPM is enabled (dpm=1) and a Hawaii card (Radeon R9 290X) is passed to a QEMU virtual machine via vfio, the guest kernel oopses with the following output:

[ 1298.394440] [drm] Initialized drm 1.1.0 20060810
[ 1298.426253] [drm] radeon kernel modesetting enabled.
[ 1298.426284] checking generic (100a0000000 1d4c00) vs hw (10130000000 10000000)
[ 1298.427014] [drm] initializing kernel modesetting (HAWAII 0x1002:0x67B0 0x1002:0x0B00 0x00).
[ 1298.427030] [drm] register mmio base: 0xE0000000
[ 1298.427034] [drm] register mmio size: 262144
[ 1298.427038] [drm] doorbell mmio base: 0x40000000
[ 1298.427042] [drm] doorbell mmio size: 8388608
[ 1298.427082] [drm:radeon_device_init [radeon]] *ERROR* Unable to find PCI I/O BAR
[ 1298.917871] [drm:radeon_atombios_init [radeon]] *ERROR* Unable to find PCI I/O BAR; using MMIO for ATOM IIO
[ 1298.917880] ATOM BIOS: C67101
[ 1298.917971] radeon 0000:00:03.0: VRAM: 4096M 0x0000000000000000 - 0x00000000FFFFFFFF (4096M used)
[ 1298.917975] radeon 0000:00:03.0: GTT: 2048M 0x0000000100000000 - 0x000000017FFFFFFF
[ 1298.917978] [drm] Detected VRAM RAM=4096M, BAR=256M
[ 1298.917980] [drm] RAM width 512bits DDR
[ 1298.918085] [TTM] Zone  kernel: Available graphics memory: 33505984 kiB
[ 1298.918088] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[ 1298.918091] [TTM] Initializing pool allocator
[ 1298.918119] [drm] radeon: 4096M of VRAM memory ready
[ 1298.918122] [drm] radeon: 2048M of GTT memory ready.
[ 1298.918141] [drm] Loading hawaii Microcode
[ 1298.918151] [drm] Internal thermal controller with fan control
[ 1298.918206] Unable to handle kernel paging request for data at address 0x0000003c
[ 1298.918210] Faulting instruction address: 0xd00000000b3bff68
[ 1298.918214] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1298.918216] SMP NR_CPUS=2048 NUMA pSeries
[ 1298.918220] Modules linked in: radeon(E+) ttm(E) drm_kms_helper(E) drm(E) fuse(E) sg(E) snd_hda_codec_hdmi(E) evdev(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) ghash_generic(E) gf128mul(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) vmx_crypto(E) i2c_algo_bit(E) soundcore(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) jbd2(E) crc32c_generic(E) fscrypto(E) ecb(E) mbcache(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) hid(E) ibmvscsi(E) scsi_transport_srp(E) virtio_net(E) virtio_pci(E) virtio_ring(E) virtio(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) [last unloaded: drm]
[ 1298.918268] CPU: 32 PID: 2466 Comm: modprobe Tainted: G        W   E   4.8.0-trunk-powerpc64le #1 Debian 4.8.4-1~exp1
[ 1298.918273] task: c000000fea6b9b80 task.stack: c000000fe123c000
[ 1298.918276] NIP: d00000000b3bff68 LR: d00000000ba30c88 CTR: d00000000b3bff20
[ 1298.918280] REGS: c000000fe123ede0 TRAP: 0300   Tainted: G        W   E    (4.8.0-trunk-powerpc64le Debian 4.8.4-1~exp1)
[ 1298.918284] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 84228428  XER: 20000000
[ 1298.918296] CFAR: c0000000000acea8 DAR: 000000000000003c DSISR: 40000000 SOFTE: 1
               GPR00: d00000000ba30c88 c000000fe123f060 d00000000b40af90 c000000fea554000
               GPR04: c000000fe123f140 0000000000000000 c000000fe2c6c000 c000000fff528050
               GPR08: 0000000ffea30000 c000000fee70e000 0000000000000000 d00000000ba58d80
               GPR12: d00000000b3bff20 c00000000fb92000 0000000020000000 c000000fe123fdec
               GPR16: c000000fe8f60000 c0000000008552c0 00000000000011d9 d00000000baf0000
               GPR20: d00000000baf0000 c00000000017de10 0000000000000000 0000000000000124
               GPR24: c000000fe2c6a000 c000000fe23df100 d00000000bab1640 d00000000bab1e48
               GPR28: c000000fe8f61b70 c000000fe8f61b70 0000000000000000 c000000fe123f140
[ 1298.918345] NIP [d00000000b3bff68] drm_pcie_get_speed_cap_mask+0x48/0x150 [drm]
[ 1298.918376] LR [d00000000ba30c88] ci_dpm_init+0xa8/0x13b0 [radeon]
[ 1298.918379] Call Trace:
[ 1298.918383] [c000000fe123f060] [c000000fe123f0e0] 0xc000000fe123f0e0 (unreliable)
[ 1298.918413] [c000000fe123f0e0] [d00000000ba30c88] ci_dpm_init+0xa8/0x13b0 [radeon]
[ 1298.918445] [c000000fe123f1f0] [d00000000b981a58] radeon_pm_init+0x628/0x910 [radeon]
[ 1298.918480] [c000000fe123f290] [d00000000b9e0f84] cik_init+0x324/0x790 [radeon]
[ 1298.918509] [c000000fe123f300] [d00000000b8f2c38] radeon_device_init+0x5a8/0xc70 [radeon]
[ 1298.918539] [c000000fe123f390] [d00000000b8f6014] radeon_driver_load_kms+0xc4/0x280 [radeon]
[ 1298.918548] [c000000fe123f410] [d00000000b3be15c] drm_dev_register+0xfc/0x140 [drm]
[ 1298.918555] [c000000fe123f450] [d00000000b3c0798] drm_get_pci_dev+0xf8/0x220 [drm]
[ 1298.918574] [c000000fe123f4e0] [d00000000b8f0664] radeon_pci_probe+0x134/0x1b0 [radeon]
[ 1298.918580] [c000000fe123f570] [c00000000050471c] local_pci_probe+0x6c/0x140
[ 1298.918583] [c000000fe123f600] [c0000000005055a8] pci_device_probe+0x168/0x200
[ 1298.918588] [c000000fe123f660] [c0000000005b4900] driver_probe_device+0x240/0x550
[ 1298.918591] [c000000fe123f6f0] [c0000000005b4d7c] __driver_attach+0x16c/0x170
[ 1298.918595] [c000000fe123f770] [c0000000005b100c] bus_for_each_dev+0x9c/0x110
[ 1298.918598] [c000000fe123f7c0] [c0000000005b3b5c] driver_attach+0x3c/0x60
[ 1298.918626] [c000000fe123f7f0] [c0000000005b33a8] bus_add_driver+0x308/0x390
[ 1298.918630] [c000000fe123f880] [c0000000005b5d1c] driver_register+0x9c/0x180
[ 1298.918633] [c000000fe123f8f0] [c0000000005038dc] __pci_register_driver+0x6c/0x90
[ 1298.918640] [c000000fe123f930] [d00000000b3c0a24] drm_pci_init+0x164/0x1b0 [drm]
[ 1298.918659] [c000000fe123f9c0] [d00000000ba53440] radeon_init+0xc8/0xf8 [radeon]
[ 1298.918663] [c000000fe123fa30] [c00000000000b74c] do_one_initcall+0x6c/0x1d0
[ 1298.918667] [c000000fe123faf0] [c000000000822b50] do_init_module+0x94/0x254
[ 1298.918671] [c000000fe123fb80] [c0000000001821b0] load_module+0x2380/0x2b30
[ 1298.918674] [c000000fe123fd30] [c000000000182ca0] SyS_finit_module+0xf0/0x170
[ 1298.918678] [c000000fe123fe30] [c000000000009560] system_call+0x38/0x108
[ 1298.918680] Instruction dump:
[ 1298.918682] f821ff81 7c7e1b78 7c9f2378 48000008 e8410018 39200000 913f0000 e93e01e8
[ 1298.918688] 2fa90000 41de0044 e9290010 ebc90038 <a13e003c> 2b891106 419e0030 2b891166
[ 1298.918696] ---[ end trace 6c565dbe73743fb5 ]---

Setting dpm=0 allows the radeon driver to load normally and the card functions as expected under Xorg.
Comment 1 Timothy Pearson 2016-10-28 19:00:23 UTC
This appears related:

https://bbs.archlinux.org/viewtopic.php?pid=1537645#p1537645
Comment 2 Alex Deucher 2016-10-28 19:10:39 UTC
Most hypervisors do not provide access to pci config registers which the driver needs access to to determine what pcie speeds are available.  That said, this should work better with amdgpu and the 4.10-wip kernel:
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.10-wip
Comment 3 Timothy Pearson 2016-10-28 22:05:23 UTC
Created attachment 127594 [details] [review]
Fix radeon kernel oops when used in vfio-based guest

The amdgpu driver doesn't look quite ready for production use on SI GPUs, but I was able to work around the issue in the radeon driver via the attached patch.
Comment 4 Martin Peres 2019-11-19 09:19:19 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/749.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.