Bug 101007 - AMDgpu kernel BUG with 5 cards
Summary: AMDgpu kernel BUG with 5 cards
Status: CLOSED NOTABUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu-pro (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-11 15:44 UTC by GSaraber
Modified: 2017-05-11 17:53 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
lspci output (77.40 KB, text/plain)
2017-05-11 17:38 UTC, GSaraber
no flags Details

Description GSaraber 2017-05-11 15:44:40 UTC
Ubuntu server 17.04, amdgpu-pro-17.10-414273
Linux mining1 4.10.0-20-generic #22-Ubuntu SMP Thu Apr 20 09:22:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Has 5 cards installed:
00:00.0 Host bridge: Intel Corporation Device 590f (rev 06)
00:01.0 PCI bridge: Intel Corporation Skylake PCIe Controller (x16) (rev 06)
00:01.1 PCI bridge: Intel Corporation Skylake PCIe Controller (x8) (rev 06)
00:14.0 USB controller: Intel Corporation 200 Series PCH USB 3.0 xHCI Controller
00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1
00:1b.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #17 (rev f0)
00:1b.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #21 (rev f0)
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #1 (rev f0)
00:1c.1 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #2 (rev f0)
00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 (rev f0)
00:1c.6 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #7 (rev f0)
00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation 200 Series PCH LPC Controller (Z270)
00:1f.2 Memory controller: Intel Corporation 200 Series PCH PMC
00:1f.4 SMBus: Intel Corporation 200 Series PCH SMBus Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
02:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7)
02:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7)
04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7)
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7)
08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0

Causes kernel BUG on bootup:
[    8.592737] amdgpu 0000:08:00.0: GTT: 4096M 0x0000000100000000 - 0x00000001FFFFFFFF
[    8.598389] ------------[ cut here ]------------
[    8.601935] kernel BUG at /build/linux-2NWldV/linux-4.10.0/arch/x86/mm/pat.c:539!
[    8.605587] invalid opcode: 0000 [#1] SMP
[    8.609243] Modules linked in: hid_generic usbhid mxm_wmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd psmouse e1000e ptp pps_core amdkfd amd_iommu_v2 amdgpu(+) i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm video wmi pinctrl_sunrisepoint pinctrl_intel i2c_hid hid fjes
[    8.617618] CPU: 1 PID: 145 Comm: systemd-udevd Not tainted 4.10.0-20-generic #22-Ubuntu
[    8.621935] Hardware name: System manufacturer System Product Name/PRIME Z270-A, BIOS 0906 03/22/2017
[    8.626312] task: ffff92630d3a2c00 task.stack: ffffaf7d410d0000
[    8.630720] RIP: 0010:reserve_memtype+0x197/0x3d0
[    8.635049] RSP: 0018:ffffaf7d410d3820 EFLAGS: 00010246
[    8.639429] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffaf7d410d387c
[    8.643902] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[    8.648401] RBP: ffffaf7d410d3860 R08: 0000000000000000 R09: 00000000000004f3
[    8.652927] R10: ffff92630ad67980 R11: 0000000000000001 R12: 0000000000000000
[    8.657479] R13: 0000000000000001 R14: 0000000000000000 R15: ffffaf7d410d38c4
[    8.662076] FS:  00007fb66305e8c0(0000) GS:ffff92631ed00000(0000) knlGS:0000000000000000
[    8.666777] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.671518] CR2: 000055d374b250f8 CR3: 000000028c451000 CR4: 00000000003406e0
[    8.676355] Call Trace:
[    8.681140]  io_reserve_memtype+0x59/0x130
[    8.685876]  ? amdgpu_mm_rreg+0xd0/0xd0 [amdgpu]
[    8.690513]  arch_io_reserve_memtype_wc+0x2f/0x50
[    8.695080]  amdgpu_bo_init+0x20/0x90 [amdgpu]
[    8.699560]  gmc_v8_0_sw_init+0x350/0x5c0 [amdgpu]
[    8.704078]  amdgpu_device_init+0xb6b/0x1180 [amdgpu]
[    8.708503]  ? kmalloc_order+0x18/0x40
[    8.712905]  ? kmalloc_order_trace+0x24/0xa0
[    8.717347]  amdgpu_driver_load_kms+0x5b/0x1f0 [amdgpu]
[    8.721837]  drm_dev_register+0x132/0x170 [drm]
[    8.726366]  drm_get_pci_dev+0x9c/0x1c0 [drm]
[    8.730937]  amdgpu_pci_probe+0xbc/0xe0 [amdgpu]
[    8.735542]  local_pci_probe+0x45/0xa0
[    8.740177]  pci_device_probe+0x103/0x150
[    8.744854]  driver_probe_device+0x2bb/0x460
[    8.749572]  __driver_attach+0xdf/0xf0
[    8.754322]  ? driver_probe_device+0x460/0x460
[    8.759087]  bus_for_each_dev+0x6c/0xc0
[    8.763880]  driver_attach+0x1e/0x20
[    8.768662]  bus_add_driver+0x170/0x270
[    8.773440]  driver_register+0x60/0xe0
[    8.778244]  __pci_register_driver+0x4c/0x50
[    8.783060]  drm_pci_init+0xeb/0x100 [drm]
[    8.787864]  amdgpu_init+0x95/0xa8 [amdgpu]
[    8.792636]  ? 0xffffffffc0448000
[    8.797405]  do_one_initcall+0x52/0x1b0
[    8.802187]  ? __vunmap+0x81/0xd0
[    8.806975]  ? kmem_cache_alloc_trace+0x142/0x190
[    8.811816]  do_init_module+0x5f/0x200
[    8.816620]  load_module+0x190b/0x1c70
[    8.821382]  ? __symbol_put+0x60/0x60
[    8.826175]  ? ima_post_read_file+0x7e/0xa0
[    8.831012]  ? security_kernel_post_read_file+0x6b/0x80
[    8.835914]  SYSC_finit_module+0xdf/0x110
[    8.840849]  SyS_finit_module+0xe/0x10
[    8.845814]  entry_SYSCALL_64_fastpath+0x1e/0xad
[    8.850837] RIP: 0033:0x7fb661eccdf9
[    8.855891] RSP: 002b:00007ffecfea4ad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    8.861083] RAX: ffffffffffffffda RBX: 0000000000000011 RCX: 00007fb661eccdf9
[    8.866348] RDX: 0000000000000000 RSI: 00007fb6627f0e23 RDI: 0000000000000012
[    8.871657] RBP: 00007ffecfea3ae0 R08: 0000000000000000 R09: 00007ffecfea5050
[    8.877020] R10: 0000000000000012 R11: 0000000000000246 R12: 000055d374a5a170
[    8.882423] R13: 00007ffecfea3ac0 R14: 0000000000000005 R15: 000055d372c3fc3c
[    8.887786] Code: c7 c1 2d de a7 ae 83 f8 05 77 08 48 8b 0c c5 60 a1 80 ae 49 8d 55 ff 4c 89 f6 48 c7 c7 78 65 a7 ae e8 7d aa 13 00 e9 af fe ff ff <0f> 0b 48 8d 55 cf 4c 89 ee 4c 89 f7 31 db e8 66 a9 fd ff 3c 06 
[    8.898931] RIP: reserve_memtype+0x197/0x3d0 RSP: ffffaf7d410d3820
[    8.904485] ---[ end trace cb0c50b70460a136 ]---
Comment 1 Christian König 2017-05-11 15:49:12 UTC
Please attach the output of "sudo lspci -vvvv".
Comment 2 GSaraber 2017-05-11 17:34:22 UTC
Update:: There is a bios setting "Decode 4G" in advanced/Boot that needs to be turned on, that stopped it from crashing, sorry for the wild goose chase.
Comment 3 GSaraber 2017-05-11 17:38:38 UTC
Created attachment 131318 [details]
lspci output

Here's the lspci -vvv output
Comment 4 Christian König 2017-05-11 17:53:54 UTC
Yeah, I expected that it is something like that.

BTW: Any interest in giving my resizeable BAR patch set a try?

It allows the driver to make all of video memory accessible to the CPU, not just the first 256MB.

Since you have a rather unusual system configuration a few results from your board would be rather interesting to have.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.