Bug 105684 - Loading amdgpu hits general protection fault: 0000 [#1] SMP NOPTI
Summary: Loading amdgpu hits general protection fault: 0000 [#1] SMP NOPTI
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-22 08:19 UTC by jian-hong
Modified: 2019-11-19 08:33 UTC (History)
10 users (show)

See Also:
i915 platform:
i915 features:


Attachments
general protection fault: 0000 [#1] SMP NOPTI (8.70 KB, text/plain)
2018-03-22 08:22 UTC, jian-hong
no flags Details
dmesg before loading amdgpu module (62.19 KB, text/plain)
2018-03-23 02:58 UTC, jian-hong
no flags Details
Oops1 after loading amdgpu module (8.68 KB, text/plain)
2018-03-23 02:59 UTC, jian-hong
no flags Details
Oops2 after loading amdgpu module (11.59 KB, text/plain)
2018-03-23 03:00 UTC, jian-hong
no flags Details
Oops3 after loading amdgpu module (13.85 KB, text/plain)
2018-03-23 03:01 UTC, jian-hong
no flags Details
Oops4 after loading amdgpu module (17.64 KB, text/plain)
2018-03-23 03:01 UTC, jian-hong
no flags Details
Oops5 after loading amdgpu module (21.75 KB, text/plain)
2018-03-23 03:01 UTC, jian-hong
no flags Details
Oops6 after loading amdgpu module (23.27 KB, text/plain)
2018-03-23 03:02 UTC, jian-hong
no flags Details
Oops7 after loading amdgpu module (22.73 KB, text/plain)
2018-03-23 03:02 UTC, jian-hong
no flags Details
dmesg of loading amdgpu module - tested in 4.16-rc7 (27.84 KB, application/zip)
2018-03-27 06:39 UTC, jian-hong
no flags Details
tested with Linux kernel 4.16+ (commit f8cf2f16a7c95acce497bfafa90e7c6d8397d653) (70.93 KB, text/plain)
2018-04-09 07:30 UTC, jian-hong
no flags Details
dmesg of loading amdgpu module - tested in 4.17-rc1 (70.98 KB, text/plain)
2018-04-17 07:56 UTC, jian-hong
no flags Details
dmesg of loading amdgpu module - tested in 4.17-rc2 (71.17 KB, text/plain)
2018-04-26 03:54 UTC, jian-hong
no flags Details
amdgpu_vbios (53.00 KB, application/octet-stream)
2018-05-10 07:10 UTC, jian-hong
no flags Details
dmesg of loading amdgpu module with patch 218586 (73.80 KB, text/plain)
2018-05-10 09:50 UTC, jian-hong
no flags Details
dmesg of loading amdgpu module - tested in 4.17-rc4 (71.92 KB, text/plain)
2018-05-11 03:20 UTC, jian-hong
no flags Details
Arch current default kernel config (212.06 KB, text/x-mpsub)
2018-05-12 20:37 UTC, Aaron
no flags Details
config of Linux kernel with freedesktop branch (210.60 KB, text/x-mpsub)
2018-05-14 03:00 UTC, jian-hong
no flags Details
config of Linux kernel 4.17-rc4 (208.30 KB, text/x-mpsub)
2018-05-14 03:04 UTC, jian-hong
no flags Details
dmesg of 4.18.0-rc6 (67.29 KB, text/plain)
2018-07-24 09:27 UTC, jian-hong
no flags Details
4.18.0-rc6 Build config file (129.18 KB, text/x-mpsub)
2018-07-26 08:21 UTC, jian-hong
no flags Details
Oops on debian kernel 4.19.0-0.bpo.4-amd64 (10.28 KB, text/plain)
2019-04-29 15:24 UTC, Jörn Frenzel
no flags Details

Description jian-hong 2018-03-22 08:19:25 UTC
I have an AMD Ryzen 5 2400G with Radeon Vega Graphics computer.
Comment 1 jian-hong 2018-03-22 08:22:48 UTC
Created attachment 138270 [details]
general protection fault: 0000 [#1] SMP NOPTI
Comment 2 jian-hong 2018-03-22 08:31:22 UTC
Sorry for the messages before, the editing error.

I have an AMD Ryzen 5 2400G with Radeon Vega Graphics computer.  When I loading amdgpu module manually, it will hit the panic.

<6>[  128.532951] [drm] Found VCN firmware Version: 1.45 Family ID: 18
<7>[  128.696866] [drm:amdgpu_dm_irq_init [amdgpu]] DM_IRQ
<7>[  128.697060] [drm:dal_firmware_parser_init_cmd_tbl [amdgpu]] Don't have set_crtc_timing for v1
<7>[  128.697357] [drm:dc_create [amdgpu]] DC: create_links: connectors_num: physical:4, virtual:0
<7>[  128.697443] [drm:log_to_debug_console [amdgpu]] Connector[0] description:signal 32
<7>[  128.697551] [drm:log_to_debug_console [amdgpu]] Connector[1] description:signal 4
<7>[  128.697652] [drm:log_to_debug_console [amdgpu]] Connector[2] description:signal 4
<4>[  128.697663] general protection fault: 0000 [#1] SMP NOPTI
<4>[  128.697670] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched ttm drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt efi_pstore edac_mce_amd kvm_amd ccp kvm irqbypass cmac bnep crct10dif_pclmul crc32_pclmul ghash_clmulni_intel arc4 pcbc aesni_intel btusb btrtl btbcm btintel bluetooth input_leds aes_x86_64 ecdh_generic snd_hda_codec_realtek ath10k_pci ath10k_core ath mac80211 cfg80211 snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd r8169 soundcore mii crypto_simd glue_helper cryptd shpchp i2c_piix4 sparse_keymap wmi_bmof psmouse mac_hid wmi tpm_crb video ip_tables x_tables serio_raw uas usb_storage ahci libahci hid_generic usbhid hid
<4>[  128.697752] CPU: 1 PID: 1216 Comm: modprobe Not tainted 4.16.0-rc6+ #6
<4>[  128.697756] Hardware name: Acer Aspire TC-380/Aspire TC-380, BIOS D05 02/01/2018
<4>[  128.697766] RIP: 0010:prefetch_freepointer+0x15/0x30
<4>[  128.697770] RSP: 0018:ffffbc1143f973e8 EFLAGS: 00010286
<4>[  128.697775] RAX: 0000000000000000 RBX: ffff9681b67cd000 RCX: 00000000000004cd
<4>[  128.697779] RDX: 00000000000004cc RSI: b8036bfcc8b36eaa RDI: ffff9681de806e80
<4>[  128.697783] RBP: ffffbc1143f973e8 R08: ffff9681dec67160 R09: ffffffffc0c4ac0c
<4>[  128.697787] R10: ffffe3f45fdcaa40 R11: 00000000000003c9 R12: 00000000014080c0
<4>[  128.697790] R13: ffff9681de806e80 R14: ffff9681b67cd000 R15: ffff9681de806e80
<4>[  128.697795] FS:  00007fc7a7ea9700(0000) GS:ffff9681dec40000(0000) knlGS:0000000000000000
<4>[  128.697800] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  128.697804] CR2: 0000559d453ef0c0 CR3: 00000007f8d8a000 CR4: 00000000003406e0
<4>[  128.697808] Call Trace:
<4>[  128.697815]  kmem_cache_alloc_trace+0xa5/0x1c0
<4>[  128.697877]  ? dal_ddc_service_create+0x3c/0x120 [amdgpu]
<4>[  128.697929]  dal_ddc_service_create+0x3c/0x120 [amdgpu]
<4>[  128.697985]  ? dal_gpio_service_create_irq+0x4a/0x70 [amdgpu]
<4>[  128.698030]  construct+0x25b/0x780 [amdgpu]
<4>[  128.698077]  link_create+0x38/0x60 [amdgpu]
<4>[  128.698121]  dc_create+0x2e6/0x6b0 [amdgpu]
<4>[  128.698169]  dm_hw_init+0x107/0xed0 [amdgpu]

Then, system hangs up and could not change the tty terminal nor be ssh into.
This happens about 50-50%.

The debug messages as the attachment.
Comment 3 jian-hong 2018-03-22 09:08:27 UTC
The kernel version I tested is 4.16-rc6.
Comment 4 Harry Wentland 2018-03-22 13:56:53 UTC
Can you attach your complete dmesg log?
Comment 5 jian-hong 2018-03-23 02:58:17 UTC
Created attachment 138293 [details]
dmesg before loading amdgpu module
Comment 6 jian-hong 2018-03-23 02:59:40 UTC
Created attachment 138294 [details]
Oops1 after loading amdgpu module
Comment 7 jian-hong 2018-03-23 03:00:35 UTC
Created attachment 138295 [details]
Oops2 after loading amdgpu module
Comment 8 jian-hong 2018-03-23 03:01:06 UTC
Created attachment 138296 [details]
Oops3 after loading amdgpu module
Comment 9 jian-hong 2018-03-23 03:01:30 UTC
Created attachment 138297 [details]
Oops4 after loading amdgpu module
Comment 10 jian-hong 2018-03-23 03:01:53 UTC
Created attachment 138298 [details]
Oops5 after loading amdgpu module
Comment 11 jian-hong 2018-03-23 03:02:17 UTC
Created attachment 138299 [details]
Oops6 after loading amdgpu module
Comment 12 jian-hong 2018-03-23 03:02:43 UTC
Created attachment 138300 [details]
Oops7 after loading amdgpu module
Comment 13 jian-hong 2018-03-23 03:12:27 UTC
Because system hangs up after loading amdgpu module, I cannot get dmesg directly at that time.
Therefore, I load efi-pstore module to store the dmesg in efi before panic happens.

The attachments:
"dmesg before loading amdgpu module": dmesg before loading amdgpu module
"Oops1~7 after loading amdgpu module": I gather the dmesg stored in efi-pstore, which is during the panic happening.  I concatenate them with the Oops number and order by the part number.
Comment 14 jian-hong 2018-03-27 06:39:45 UTC
Created attachment 138370 [details]
dmesg of loading amdgpu module - tested in 4.16-rc7

I also tested with linux kernel v4.16-rc7 and got the same problem again when I loaded amdgpu module manually.
The attechment is the whole dmesg for this test.

<6>[   96.502996] [drm] amdgpu kernel modesetting enabled.
<6>[   96.511435] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
<6>[   96.518456] Parsing CRAT table with 1 nodes
<6>[   96.519588] Creating topology SYSFS entries
<6>[   96.520729] Topology: Add APU node [0x0:0x0]
<6>[   96.521851] Finished initializing topology
<6>[   96.522991] kfd kfd: Initialized module
<7>[   96.524244] checking generic (e0000000 7f0000) vs hw (e0000000 10000000)
<6>[   96.524245] fb: switching to amdgpudrmfb from EFI VGA
<6>[   96.525412] Console: switching to colour dummy device 80x25
<6>[   96.525551] amdgpu 0000:09:00.0: enabling device (0006 -> 0007)
<6>[   96.525713] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x1025:0x1257 0xC6).
<6>[   96.525744] [drm] register mmio base: 0xFE700000
<6>[   96.525746] [drm] register mmio size: 524288
<6>[   96.528186] [drm] probing gen 2 caps for device 1022:15db = 700d03/e
<6>[   96.528193] [drm] probing mlw for device 1022:15db = 700d03
<6>[   96.528392] [drm] VCN decode is enabled in VM mode
<6>[   96.528395] [drm] VCN encode is enabled in VM mode
<6>[   96.528411] ATOM BIOS: 113-RAVEN-T08
<6>[   96.528443] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
<6>[   96.528455] amdgpu 0000:09:00.0: GTT: 1024M 0x000000F500000000 - 0x000000F53FFFFFFF
<6>[   96.528467] [drm] Detected VRAM RAM=1024M, BAR=256M
<6>[   96.528468] [drm] RAM width 64bits UNKNOWN
<6>[   96.528530] [TTM] Zone  kernel: Available graphics memory: 15950952 kiB
<6>[   96.528532] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
<6>[   96.528534] [TTM] Initializing pool allocator
<6>[   96.528538] [TTM] Initializing DMA pool allocator
<6>[   96.528567] [drm] amdgpu: 1024M of VRAM memory ready
<6>[   96.528569] [drm] amdgpu: 3072M of GTT memory ready.
<6>[   96.528576] [drm] GART: num cpu pages 262144, num gpu pages 262144
<6>[   96.529123] [drm] PCIE GART of 1024M enabled (table at 0x000000F400800000).
<6>[   96.550367] [drm] use_doorbell being set to: [true]
<6>[   96.565847] [drm] Found VCN firmware Version: 1.45 Family ID: 18
<6>[   96.720523] [drm] Display Core initialized with v3.1.27!
<4>[   96.720559] general protection fault: 0000 [#1] SMP NOPTI
<4>[   96.720562] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched ttm drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt efi_pstore arc4 cmac bnep edac_mce_amd kvm_amd ccp kvm irqbypass snd_hda_codec_realtek ath10k_pci ath10k_core ath btusb crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrtl input_leds btbcm btintel mac80211 snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore bluetooth ecdh_generic pcbc aesni_intel cfg80211 r8169 i2c_piix4 wmi_bmof mii sparse_keymap shpchp aes_x86_64 crypto_simd glue_helper cryptd psmouse mac_hid tpm_crb video wmi zram ip_tables x_tables serio_raw uas usb_storage ahci libahci hid_generic usbhid hid
<4>[   96.720624] CPU: 0 PID: 933 Comm: modprobe Not tainted 4.16.0-rc7+ #8
<4>[   96.720626] Hardware name: Acer Aspire TC-380/Aspire TC-380, BIOS D05 02/01/2018
<4>[   96.720635] RIP: 0010:prefetch_freepointer+0x15/0x30
<4>[   96.720637] RSP: 0018:ffffba2808367840 EFLAGS: 00010202
<4>[   96.720640] RAX: 0000000000000000 RBX: ffff9fcb9d9c5800 RCX: 0000000000000ae8
<4>[   96.720643] RDX: 0000000000000ae7 RSI: 597f068ab1e00726 RDI: ffff9fcbbf006e80
<4>[   96.720645] RBP: ffffba2808367840 R08: ffff9fcbbf627160 R09: ffffffffc096ef82
<4>[   96.720647] R10: 0000000000000024 R11: ffff9fcb99f3ac97 R12: 00000000014080c0
<4>[   96.720649] R13: ffff9fcbbf006e80 R14: ffff9fcb9d9c5800 R15: ffff9fcbbf006e80
<4>[   96.720653] FS:  00007f8bc0eab700(0000) GS:ffff9fcbbf600000(0000) knlGS:0000000000000000
<4>[   96.720655] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   96.720658] CR2: 00007ffc4eec6ff8 CR3: 00000007dd9a6000 CR4: 00000000003406f0
<4>[   96.720660] Call Trace:
<4>[   96.720666]  kmem_cache_alloc_trace+0xa5/0x1c0
<4>[   96.720741]  ? dm_hw_init+0x462/0xed0 [amdgpu]
<4>[   96.720792]  dm_hw_init+0x462/0xed0 [amdgpu]
<4>[   96.720832]  amdgpu_device_init+0xc1b/0x1340 [amdgpu]
<4>[   96.720872]  amdgpu_driver_load_kms+0x8b/0x2c0 [amdgpu]
<4>[   96.720888]  drm_dev_register+0x149/0x1e0 [drm]
<4>[   96.720927]  amdgpu_pci_probe+0x10a/0x180 [amdgpu]
<4>[   96.720931]  local_pci_probe+0x4a/0xa0
<4>[   96.720934]  pci_device_probe+0x109/0x1b0
<4>[   96.720938]  driver_probe_device+0x2bb/0x4a0
<4>[   96.720941]  __driver_attach+0xe2/0xf0
<4>[   96.720944]  ? driver_probe_device+0x4a0/0x4a0
<4>[   96.720947]  bus_for_each_dev+0x6a/0xc0
<4>[   96.720949]  ? kmem_cache_alloc_trace+0x1a6/0x1c0
<4>[   96.720952]  driver_attach+0x1e/0x20
<4>[   96.720955]  bus_add_driver+0x170/0x260
<4>[   96.720958]  driver_register+0x60/0xe0
<4>[   96.720961]  ? 0xffffffffc0af3000
<4>[   96.720964]  __pci_register_driver+0x5a/0x60
<4>[   96.721003]  amdgpu_init+0x83/0x92 [amdgpu]
<4>[   96.721006]  do_one_initcall+0x55/0x19d
<4>[   96.721009]  ? __vunmap+0x81/0xb0
<4>[   96.721013]  ? _cond_resched+0x1a/0x50
<4>[   96.721015]  ? kmem_cache_alloc_trace+0xa5/0x1c0
<4>[   96.721019]  ? do_init_module+0x27/0x219
<4>[   96.721021]  do_init_module+0x5f/0x219
<4>[   96.721024]  load_module+0x260e/0x2e10
<4>[   96.721028]  ? ima_post_read_file+0x83/0xa0
<4>[   96.721032]  SYSC_finit_module+0xe5/0x120
<4>[   96.721034]  ? SYSC_finit_module+0xe5/0x120
<4>[   96.721037]  SyS_finit_module+0xe/0x10
<4>[   96.721040]  do_syscall_64+0x73/0x130
<4>[   96.721043]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
<4>[   96.721045] RIP: 0033:0x7f8bc09f0229
<4>[   96.721047] RSP: 002b:00007ffc4eeca168 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4>[   96.721050] RAX: ffffffffffffffda RBX: 00005568f4a07230 RCX: 00007f8bc09f0229
<4>[   96.721053] RDX: 0000000000000000 RSI: 00005568f3310638 RDI: 000000000000000d
<4>[   96.721055] RBP: 00005568f3310638 R08: 0000000000000000 R09: 0000000000000000
<4>[   96.721057] R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
<4>[   96.721059] R13: 00005568f4a07360 R14: 0000000000040000 R15: 0000000000000000
<4>[   96.721062] Code: 49 8b 74 24 60 48 c7 c7 18 0c cf b4 e8 15 85 ea ff eb 90 0f 1f 00 0f 1f 44 00 00 55 48 85 f6 48 89 e5 74 14 48 63 47 20 48 01 c6 <48> 33 36 48 33 b7 40 01 00 00 0f 18 0e 5d c3 66 90 66 2e 0f 1f 
<1>[   96.721091] RIP: prefetch_freepointer+0x15/0x30 RSP: ffffba2808367840
<4>[   96.721094] ---[ end trace d865bcaaf3cc5d66 ]---
Comment 15 jian-hong 2018-04-09 07:30:36 UTC
Created attachment 138692 [details]
tested with Linux kernel 4.16+ (commit f8cf2f16a7c95acce497bfafa90e7c6d8397d653)

I have tried Linux kernel 4.16+ (commit f8cf2f16a7c95acce497bfafa90e7c6d8397d653) and load amdgpu module manually.
The error becomes different: "system will hit the NULL pointer dereference at 0000000000000018" during loading amdgpu module.

[   26.715245] [drm] use_doorbell being set to: [true]
[   26.730536] [drm] Found VCN firmware Version: 1.45 Family ID: 18
[   26.894292] amdgpu: [powerplay] dpm has been enabled
[   26.896020] [drm] Display Core initialized with v3.1.38!
[   26.896269] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[   26.896274] PGD 0 P4D 0 
[   26.896277] Oops: 0000 [#1] SMP NOPTI
[   26.896280] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched ttm drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt efi_pstore cmac bnep edac_mce_amd kvm_amd arc4 ccp kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek ghash_clmulni_intel snd_hda_codec_generic btusb pcbc snd_hda_codec_hdmi input_leds snd_hda_intel btrtl btbcm btintel bluetooth ecdh_generic snd_hda_codec aesni_intel snd_hda_core ath10k_pci ath10k_core wmi_bmof snd_hwdep aes_x86_64 crypto_simd cryptd r8169 glue_helper snd_pcm ath mac80211 sparse_keymap cfg80211 snd_timer snd mii wmi soundcore psmouse shpchp i2c_piix4 mac_hid video zram ip_tables x_tables hid_generic serio_raw uas ahci usb_storage libahci usbhid hid
[   26.896324] CPU: 5 PID: 752 Comm: modprobe Not tainted 4.16.0+ #10
[   26.896326] Hardware name: Acer Aspire TC-380/Aspire TC-380, BIOS D05 02/01/2018
[   26.896331] RIP: 0010:klist_node_init+0x1c/0x40
[   26.896333] RSP: 0018:ffffb89e823ef6e0 EFLAGS: 00010246
[   26.896335] RAX: ffff8bd9732e66f0 RBX: 0000000000000000 RCX: 000000000000121c
[   26.896337] RDX: 000000000000121b RSI: ffff8bd9732e66e8 RDI: 0000000000000000
[   26.896339] RBP: ffffb89e823ef6f8 R08: 00000000000271a0 R09: 00000000fffffffe
[   26.896341] R10: ffffee228fccbe00 R11: ffff8bd9732fa000 R12: ffff8bd9732e66e8
[   26.896343] R13: 0000000000000000 R14: ffff8bd98795c000 R15: ffff8bd98d4ef600
[   26.896346] FS:  00007f75665d7700(0000) GS:ffff8bd99ed40000(0000) knlGS:0000000000000000
[   26.896348] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.896350] CR2: 0000000000000018 CR3: 00000003f7986000 CR4: 00000000003406e0
[   26.896352] Call Trace:
[   26.896355]  ? klist_add_tail+0x18/0x50
[   26.896360]  device_add+0x38d/0x640
[   26.896363]  device_create_groups_vargs+0xe0/0xf0
[   26.896366]  device_create_with_groups+0x3f/0x60
[   26.896370]  ? fb_get_options+0x26/0x180
[   26.896382]  drm_sysfs_connector_add+0x59/0xa0 [drm]
[   26.896392]  drm_connector_register.part.9+0x4b/0xb0 [drm]
[   26.896402]  drm_connector_register+0x1a/0x20 [drm]
[   26.896455]  dm_hw_init+0x854/0xe50 [amdgpu]
[   26.896491]  amdgpu_device_init+0x13c5/0x1490 [amdgpu]
[   26.896526]  amdgpu_driver_load_kms+0x8b/0x2c0 [amdgpu]
[   26.896535]  drm_dev_register+0x149/0x1e0 [drm]
[   26.896571]  amdgpu_pci_probe+0x13f/0x1f0 [amdgpu]
[   26.896574]  local_pci_probe+0x4a/0xa0
[   26.896577]  pci_device_probe+0x109/0x1b0
[   26.896580]  driver_probe_device+0x2bb/0x4a0
[   26.896582]  __driver_attach+0xe2/0xf0
[   26.896584]  ? driver_probe_device+0x4a0/0x4a0
[   26.896587]  bus_for_each_dev+0x6a/0xc0
[   26.896590]  ? kmem_cache_alloc_trace+0x1c4/0x1d0
[   26.896592]  driver_attach+0x1e/0x20
[   26.896594]  bus_add_driver+0x170/0x260
[   26.896596]  driver_register+0x60/0xe0
[   26.896599]  ? 0xffffffffc0c04000
[   26.896601]  __pci_register_driver+0x5a/0x60
[   26.896636]  amdgpu_init+0x7a/0x89 [amdgpu]
[   26.896639]  do_one_initcall+0x55/0x19d
[   26.896642]  ? __vunmap+0x81/0xb0
[   26.896644]  ? _cond_resched+0x1a/0x50
[   26.896646]  ? kmem_cache_alloc_trace+0xbb/0x1d0
[   26.896650]  ? do_init_module+0x27/0x219
[   26.896653]  do_init_module+0x5f/0x219
[   26.896655]  load_module+0x260e/0x2e10
[   26.896659]  SYSC_finit_module+0xe5/0x120
[   26.896662]  ? SYSC_finit_module+0xe5/0x120
[   26.896665]  SyS_finit_module+0xe/0x10
[   26.896667]  do_syscall_64+0x73/0x130
Comment 16 jian-hong 2018-04-17 07:56:41 UTC
Created attachment 138874 [details]
dmesg of loading amdgpu module - tested in 4.17-rc1

I have tried Linux kernel 4.17-rc1 and load amdgpu module manually.
The error looks same as the test with linux kernel v4.16-rc7: "general protection fault: 0000 [#1] SMP NOPTI" when load amdgpu module.

[   34.068370] [drm] use_doorbell being set to: [true]
[   34.092115] [drm] Found VCN firmware Version: 1.45 Family ID: 18
[   34.261809] amdgpu: [powerplay] dpm has been enabled
[   34.274305] [drm] Display Core initialized with v3.1.38!
[   34.274343] general protection fault: 0000 [#1] SMP NOPTI
[   34.274345] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched ttm drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt efi_pstore cmac bnep arc4 btusb btrtl btbcm zram btintel bluetooth snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep ecdh_generic input_leds snd_pcm edac_mce_amd kvm_amd ccp kvm snd_timer snd r8169 soundcore irqbypass crct10dif_pclmul crc32_pclmul ath10k_pci ath10k_core ath ghash_clmulni_intel mac80211 pcbc aesni_intel cfg80211 mii aes_x86_64 ahci libahci crypto_simd wmi_bmof shpchp sparse_keymap i2c_piix4 cryptd glue_helper psmouse mac_hid wmi ip_tables x_tables hid_generic uas serio_raw usbhid usb_storage hid video
[   34.274391] CPU: 0 PID: 766 Comm: modprobe Not tainted 4.17.0-rc1 #12
[   34.274393] Hardware name: Acer Aspire TC-380/Aspire TC-380, BIOS D05 02/01/2018
[   34.274400] RIP: 0010:prefetch_freepointer+0x14/0x30
[   34.274402] RSP: 0018:ffffb860482d7838 EFLAGS: 00010206
[   34.274404] RAX: 0000000000000000 RBX: ffff9ed4392e2400 RCX: 000000000000064d
[   34.274406] RDX: 000000000000064c RSI: 69a633535ad058a5 RDI: ffff9ed43f006e80
[   34.274408] RBP: ffffb860482d7838 R08: ffff9ed43f627160 R09: ffff9ed41a3d6900
[   34.274410] R10: 0000000000000024 R11: ffff9ed4203278cf R12: 00000000014080c0
[   34.274412] R13: ffff9ed43f006e80 R14: ffff9ed4392e2400 R15: ffff9ed43f006e80
[   34.274415] FS:  00007f3c39e38700(0000) GS:ffff9ed43f600000(0000) knlGS:0000000000000000
[   34.274417] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.274419] CR2: 00007f8b1427e000 CR3: 00000007fb030000 CR4: 00000000003406f0
[   34.274421] Call Trace:
[   34.274426]  kmem_cache_alloc_trace+0xbb/0x1d0
[   34.274479]  ? dm_hw_init+0x476/0xe60 [amdgpu]
[   34.274522]  dm_hw_init+0x476/0xe60 [amdgpu]
[   34.274526]  ? vprintk_func+0x27/0x60
[   34.274528]  ? printk+0x52/0x6e
[   34.274564]  amdgpu_device_init+0x13c5/0x1490 [amdgpu]
[   34.274599]  amdgpu_driver_load_kms+0x8b/0x2c0 [amdgpu]
[   34.274611]  drm_dev_register+0x149/0x1e0 [drm]
[   34.274646]  amdgpu_pci_probe+0x13f/0x1f0 [amdgpu]
[   34.274650]  local_pci_probe+0x4a/0xa0
[   34.274652]  pci_device_probe+0x109/0x1b0
[   34.274655]  driver_probe_device+0x2bb/0x4a0
[   34.274658]  __driver_attach+0xe2/0xf0
[   34.274660]  ? driver_probe_device+0x4a0/0x4a0
[   34.274663]  bus_for_each_dev+0x6a/0xc0
[   34.274665]  ? kmem_cache_alloc_trace+0x1c4/0x1d0
[   34.274667]  driver_attach+0x1e/0x20
[   34.274670]  bus_add_driver+0x170/0x260
[   34.274672]  driver_register+0x60/0xe0
[   34.274674]  ? 0xffffffffc0a76000
[   34.274677]  __pci_register_driver+0x5a/0x60
[   34.274711]  amdgpu_init+0x7a/0x89 [amdgpu]
[   34.274714]  do_one_initcall+0x52/0x1cd
[   34.274717]  ? __vunmap+0x81/0xb0
[   34.274720]  ? _cond_resched+0x1a/0x50
[   34.274722]  ? kmem_cache_alloc_trace+0xbb/0x1d0
[   34.274725]  ? do_init_module+0x27/0x219
[   34.274727]  do_init_module+0x5f/0x219
[   34.274729]  load_module+0x260e/0x2e10
[   34.274733]  __do_sys_finit_module+0xe5/0x120
[   34.274735]  ? __do_sys_finit_module+0xe5/0x120
[   34.274738]  __x64_sys_finit_module+0x1a/0x20
[   34.274741]  do_syscall_64+0x5a/0x110
[   34.274743]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   34.274745] RIP: 0033:0x7f3c3997d229
[   34.274747] RSP: 002b:00007ffdc43c4e58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   34.274750] RAX: ffffffffffffffda RBX: 000055be49331280 RCX: 00007f3c3997d229
[   34.274752] RDX: 0000000000000000 RSI: 000055be487e9638 RDI: 000000000000000d
[   34.274754] RBP: 000055be487e9638 R08: 0000000000000000 R09: 0000000000000000
[   34.274756] R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
[   34.274758] R13: 000055be493313b0 R14: 0000000000040000 R15: 0000000000000000
[   34.274760] Code: 5b 90 84 e8 12 88 e9 ff eb 91 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 85 f6 48 89 e5 74 13 8b 47 20 48 01 c6 <48> 33 36 48 33 b7 38 01 00 00 0f 18 0e 5d c3 0f 1f 00 66 2e 0f 
[   34.274785] RIP: prefetch_freepointer+0x14/0x30 RSP: ffffb860482d7838
[   34.274788] ---[ end trace 2cfc1725d9f54c54 ]---
Comment 17 jian-hong 2018-04-26 03:54:53 UTC
Created attachment 139117 [details]
dmesg of loading amdgpu module - tested in 4.17-rc2

I have tried Linux kernel 4.17-rc2 and load amdgpu module manually.
The error looks same as the test with linux kernel v4.16-rc7: "general protection fault: 0000 [#1] SMP NOPTI" when load amdgpu module.

[   44.572718] [drm] use_doorbell being set to: [true]
[   44.583120] [drm] Found VCN firmware Version: 1.45 Family ID: 18
[   44.748459] amdgpu: [powerplay] dpm has been enabled
[   44.762642] [drm] Display Core initialized with v3.1.38!
[   44.762679] general protection fault: 0000 [#1] SMP NOPTI
[   44.762682] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched ttm drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt efi_pstore cmac bnep snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel arc4 snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer edac_mce_amd btusb ath10k_pci ath10k_core snd ath mac80211 btrtl kvm_amd btbcm btintel bluetooth ecdh_generic ccp kvm cfg80211 soundcore irqbypass input_leds r8169 mii crct10dif_pclmul shpchp crc32_pclmul ghash_clmulni_intel wmi_bmof sparse_keymap pcbc aesni_intel aes_x86_64 crypto_simd cryptd i2c_piix4 psmouse glue_helper mac_hid wmi video zram ip_tables x_tables uas usb_storage serio_raw ahci libahci hid_generic usbhid hid
[   44.762727] CPU: 4 PID: 843 Comm: modprobe Not tainted 4.17.0-rc2 #1
[   44.762729] Hardware name: Acer Aspire TC-380/Aspire TC-380, BIOS D05 02/01/2018
[   44.762736] RIP: 0010:prefetch_freepointer+0x14/0x30
[   44.762737] RSP: 0018:ffffc1efc3d37820 EFLAGS: 00010282
[   44.762740] RAX: 0000000000000000 RBX: ffff9c63df73d400 RCX: 0000000000001004
[   44.762742] RDX: 0000000000001003 RSI: a5bd8d6388f3d443 RDI: ffff9c63ff006e80
[   44.762744] RBP: ffffc1efc3d37820 R08: ffff9c63ff727160 R09: ffff9c63ff007c00
[   44.762746] R10: ffffc1efc3d37830 R11: ffff9c63f7637c97 R12: 00000000014080c0
[   44.762748] R13: ffff9c63ff006e80 R14: ffff9c63df73d400 R15: ffff9c63ff006e80
[   44.762751] FS:  00007f373bef4700(0000) GS:ffff9c63ff700000(0000) knlGS:0000000000000000
[   44.762753] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   44.762755] CR2: 00007f569d13a740 CR3: 00000007e044c000 CR4: 00000000003406e0
[   44.762757] Call Trace:
[   44.762761]  kmem_cache_alloc_trace+0xbb/0x1d0
[   44.762815]  ? dm_crtc_reset_state+0x34/0x50 [amdgpu]
[   44.762861]  dm_crtc_reset_state+0x34/0x50 [amdgpu]
[   44.762904]  dm_hw_init+0x3d3/0xe60 [amdgpu]
[   44.762908]  ? vprintk_func+0x27/0x60
[   44.762910]  ? printk+0x52/0x6e
[   44.762946]  amdgpu_device_init+0x13c5/0x1490 [amdgpu]
[   44.762982]  amdgpu_driver_load_kms+0x8b/0x2c0 [amdgpu]
[   44.762994]  drm_dev_register+0x149/0x1e0 [drm]
[   44.763029]  amdgpu_pci_probe+0x13f/0x1f0 [amdgpu]
[   44.763033]  local_pci_probe+0x4a/0xa0
[   44.763035]  pci_device_probe+0x109/0x1b0
[   44.763039]  driver_probe_device+0x2bb/0x4a0
[   44.763041]  __driver_attach+0xe2/0xf0
[   44.763043]  ? driver_probe_device+0x4a0/0x4a0
[   44.763046]  bus_for_each_dev+0x6a/0xc0
[   44.763049]  ? kmem_cache_alloc_trace+0x1c4/0x1d0
[   44.763051]  driver_attach+0x1e/0x20
[   44.763053]  bus_add_driver+0x170/0x260
[   44.763055]  driver_register+0x60/0xe0
[   44.763058]  ? 0xffffffffc0851000
[   44.763060]  __pci_register_driver+0x5a/0x60
[   44.763094]  amdgpu_init+0x7a/0x89 [amdgpu]
[   44.763097]  do_one_initcall+0x52/0x1cd
[   44.763100]  ? __vunmap+0x81/0xb0
[   44.763103]  ? _cond_resched+0x1a/0x50
[   44.763105]  ? kmem_cache_alloc_trace+0xbb/0x1d0
[   44.763108]  ? do_init_module+0x27/0x219
[   44.763110]  do_init_module+0x5f/0x219
[   44.763113]  load_module+0x260e/0x2e10
[   44.763116]  __do_sys_finit_module+0xe5/0x120
[   44.763118]  ? __do_sys_finit_module+0xe5/0x120
[   44.763121]  __x64_sys_finit_module+0x1a/0x20
[   44.763124]  do_syscall_64+0x5a/0x110
[   44.763126]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   44.763128] RIP: 0033:0x7f373ba3a229
[   44.763130] RSP: 002b:00007ffdf0c03608 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   44.763133] RAX: ffffffffffffffda RBX: 0000560533d62400 RCX: 00007f373ba3a229
[   44.763135] RDX: 0000000000000000 RSI: 0000560533354638 RDI: 000000000000000d
[   44.763137] RBP: 0000560533354638 R08: 0000000000000000 R09: 0000000000000000
[   44.763139] R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
[   44.763141] R13: 0000560533d62530 R14: 0000000000040000 R15: 0000000000000000
[   44.763143] Code: 5d 30 8f e8 72 87 e9 ff eb 91 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 85 f6 48 89 e5 74 13 8b 47 20 48 01 c6 <48> 33 36 48 33 b7 38 01 00 00 0f 18 0e 5d c3 0f 1f 00 66 2e 0f 
[   44.763169] RIP: prefetch_freepointer+0x14/0x30 RSP: ffffc1efc3d37820
[   44.763171] ---[ end trace 87db4d1e492910da ]---
Comment 18 Alex Deucher 2018-05-09 18:49:43 UTC
What physical display connectors are on the board?  Can you attach a copy of the vbios from your system?  You can access the vbios via /sys/kernel/debug/dri/0/amdgpu_vbios

Is this a regression?  If so, when was the last time it worked correctly?
Comment 19 Alex Deucher 2018-05-09 19:06:14 UTC
Does this patch help?
https://patchwork.freedesktop.org/patch/218586/
Comment 20 jian-hong 2018-05-10 07:10:16 UTC
Created attachment 139456 [details]
amdgpu_vbios

This is the amdgpu_vbios copied from /sys/kernel/debug/dri/0/amdgpu_vbios on this ACER desktop.
Comment 21 jian-hong 2018-05-10 09:27:56 UTC
For the question "What physical display connectors are on the board?"

There are one HDMI and one VGA connectors on this mother board.  I have tried both connectors and both of them hit this bug.
Comment 22 jian-hong 2018-05-10 09:50:03 UTC
Created attachment 139462 [details]
dmesg of loading amdgpu module with patch 218586

I tried git://people.freedesktop.org/~agd5f/linux on amd-staging-drm-next branch. The last commit is "905aa01b240f9216b6dbba3226bf10b2d96eebb7 drm/amd/display: clean up assignment of amdgpu_crtc" which contains the patch https://patchwork.freedesktop.org/patch/218586/ .

However, the system still hung up with the following error.  The attachment is the full dmesg.

[   45.184948] amdgpu: [powerplay] dpm has been enabled
[   45.185017] general protection fault: 0000 [#1] SMP NOPTI
[   45.185021] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched ttm drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt efi_pstore cmac bnep edac_mce_amd arc4 btusb snd_hda_codec_realtek snd_hda_codec_hdmi kvm_amd btrtl btbcm ccp btintel kvm bluetooth snd_hda_codec_generic snd_hda_intel snd_hda_codec irqbypass crct10dif_pclmul crc32_pclmul snd_hda_core snd_hwdep ghash_clmulni_intel pcbc input_leds aesni_intel aes_x86_64 snd_pcm ecdh_generic crypto_simd wmi_bmof sparse_keymap snd_timer glue_helper r8169 cryptd snd soundcore mii ahci ath10k_pci ath10k_core psmouse ath mac80211 cfg80211 libahci shpchp wmi i2c_piix4 tpm_crb mac_hid zram ip_tables x_tables hid_generic usbhid hid serio_raw video uas usb_storage
[   45.185098] CPU: 4 PID: 848 Comm: modprobe Not tainted 4.16.0-rc7+ #1
[   45.185102] Hardware name: Acer Aspire TC-380/Aspire TC-380, BIOS D05 02/01/2018
[   45.185113] RIP: 0010:prefetch_freepointer+0x15/0x30
[   45.185117] RSP: 0018:ffffafb583d97730 EFLAGS: 00010206
[   45.185121] RAX: 0000000000000000 RBX: ffff9c429b792400 RCX: 0000000000000cc8
[   45.185124] RDX: 0000000000000cc7 RSI: 7c4c9a908a85a5b1 RDI: ffff9c42bf006e80
[   45.185128] RBP: ffffafb583d97730 R08: ffff9c42bf727160 R09: ffffffffc08c9d31
[   45.185131] R10: ffffafb583d97738 R11: 0000000000000000 R12: 00000000014080c0
[   45.185135] R13: ffff9c42bf006e80 R14: ffff9c429b792400 R15: ffff9c42bf006e80
[   45.185139] FS:  00007f43bfa74700(0000) GS:ffff9c42bf700000(0000) knlGS:0000000000000000
[   45.185143] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.185146] CR2: 00007f65fb3c1740 CR3: 00000007db678000 CR4: 00000000003406e0
[   45.185150] Call Trace:
[   45.185157]  kmem_cache_alloc_trace+0xa5/0x1b0
[   45.185259]  ? dcn10_create_resource_pool+0x41/0x9d0 [amdgpu]
[   45.185345]  dcn10_create_resource_pool+0x41/0x9d0 [amdgpu]
[   45.185430]  ? dal_aux_engine_construct+0x12/0x30 [amdgpu]
[   45.185511]  ? dal_aux_engine_dce110_create+0x3f/0x80 [amdgpu]
[   45.185593]  dc_create_resource_pool+0x40/0x170 [amdgpu]
[   45.185598]  ? _cond_resched+0x1a/0x50
[   45.185602]  ? __kmalloc+0x1d5/0x210
[   45.185685]  ? dal_gpio_service_create+0x97/0x110 [amdgpu]
[   45.185765]  dc_create+0x22f/0x660 [amdgpu]
[   45.185848]  dm_hw_init+0xc3/0x250 [amdgpu]
[   45.185897]  amdgpu_device_init+0x13a6/0x1470 [amdgpu]
[   45.185947]  amdgpu_driver_load_kms+0x8b/0x2c0 [amdgpu]
[   45.185966]  drm_dev_register+0x146/0x1d0 [drm]
[   45.186010]  amdgpu_pci_probe+0x13f/0x1f0 [amdgpu]
[   45.186015]  local_pci_probe+0x45/0xa0
[   45.186018]  pci_device_probe+0x109/0x1b0
[   45.186022]  driver_probe_device+0x2b2/0x490
[   45.186024]  __driver_attach+0xdf/0xf0
[   45.186026]  ? driver_probe_device+0x490/0x490
[   45.186030]  bus_for_each_dev+0x64/0xb0
[   45.186032]  ? kmem_cache_alloc_trace+0x1a4/0x1b0
[   45.186034]  driver_attach+0x1e/0x20
[   45.186037]  bus_add_driver+0x170/0x260
[   45.186039]  driver_register+0x60/0xe0
[   45.186042]  ? 0xffffffffc0a26000
[   45.186045]  __pci_register_driver+0x5a/0x60
[   45.186087]  amdgpu_init+0x7a/0x89 [amdgpu]
[   45.186091]  do_one_initcall+0x52/0x193
[   45.186094]  ? __vunmap+0x81/0xb0
[   45.186096]  ? _cond_resched+0x1a/0x50
[   45.186098]  ? kmem_cache_alloc_trace+0xa5/0x1b0
[   45.186102]  ? do_init_module+0x27/0x219
[   45.186104]  do_init_module+0x5f/0x219
[   45.186106]  load_module+0x25b5/0x2dd0
[   45.186111]  ? ima_post_read_file+0x83/0xa0
[   45.186114]  SYSC_finit_module+0xe5/0x120
[   45.186117]  ? SYSC_finit_module+0xe5/0x120
[   45.186120]  SyS_finit_module+0xe/0x10
[   45.186122]  do_syscall_64+0x6d/0x120
[   45.186126]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   45.186128] RIP: 0033:0x7f43bf5b7229
[   45.186130] RSP: 002b:00007ffcd696fdf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   45.186133] RAX: ffffffffffffffda RBX: 000055974fddf420 RCX: 00007f43bf5b7229
[   45.186135] RDX: 0000000000000000 RSI: 000055974ecf8638 RDI: 000000000000000d
[   45.186137] RBP: 000055974ecf8638 R08: 0000000000000000 R09: 0000000000000000
[   45.186139] R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
[   45.186141] R13: 000055974fddf550 R14: 0000000000040000 R15: 0000000000000000
[   45.186144] Code: 49 8b 74 24 60 48 c7 c7 f0 07 8f 99 e8 05 f0 ea ff eb 90 0f 1f 00 0f 1f 44 00 00 55 48 85 f6 48 89 e5 74 14 48 63 47 20 48 01 c6 <48> 33 36 48 33 b7 40 01 00 00 0f 18 0e 5d c3 66 90 66 2e 0f 1f 
[   45.186169] RIP: prefetch_freepointer+0x15/0x30 RSP: ffffafb583d97730
[   45.186172] ---[ end trace 221043aa704603ca ]---
Comment 23 Daniel Drake 2018-05-10 13:20:05 UTC
> Is this a regression?  If so, when was the last time it worked correctly?

Thanks for looking at this. It is not a regression, we have yet to find a kernel which is able to load amdgpu without crashing on these platforms.
Comment 24 jian-hong 2018-05-11 03:20:26 UTC
Created attachment 139485 [details]
dmesg of loading amdgpu module - tested in 4.17-rc4

I also tried Linux kernel 4.17-rc4 again.  However, the system still hung up with the following error.  The attachment is the full dmesg.

[   81.260098] [drm] use_doorbell being set to: [true]
[   81.270714] [drm] Found VCN firmware Version: 1.45 Family ID: 18
[   81.435566] amdgpu: [powerplay] dpm has been enabled
[   81.435840] general protection fault: 0000 [#1] SMP NOPTI
[   81.435844] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) chash gpu_sched ttm drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt deflate efi_pstore cmac bnep arc4 edac_mce_amd kvm_amd ccp kvm btusb btrtl btbcm btintel bluetooth ecdh_generic input_leds irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hda_codec_realtek ath10k_pci aesni_intel snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel aes_x86_64 ath10k_core ath r8169 sparse_keymap mii crypto_simd mac80211 cfg80211 wmi_bmof snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer cryptd snd glue_helper psmouse soundcore i2c_piix4 shpchp k10temp video wmi mac_hid zram ip_tables x_tables hid_generic serio_raw usbhid hid uas usb_storage ahci libahci
[   81.435919] CPU: 4 PID: 877 Comm: modprobe Not tainted 4.17.0-rc4+ #1
[   81.435922] Hardware name: Acer Aspire TC-380/Aspire TC-380, BIOS D05 02/01/2018
[   81.435932] RIP: 0010:prefetch_freepointer+0x14/0x30
[   81.435936] RSP: 0018:ffff99ec83ee33d8 EFLAGS: 00010202
[   81.435940] RAX: 0000000000000000 RBX: ffff88ad20ced800 RCX: 0000000000000fe4
[   81.435943] RDX: 0000000000000fe3 RSI: 0d95bdbe4f99878a RDI: ffff88ad3f006e80
[   81.435946] RBP: ffff99ec83ee33d8 R08: ffff88ad3f727160 R09: ffffffffc08abb4b
[   81.435950] R10: ffffc456df68a140 R11: 0000000000000001 R12: 00000000014080c0
[   81.435953] R13: ffff88ad3f006e80 R14: ffff88ad20ced800 R15: ffff88ad3f006e80
[   81.435957] FS:  00007f108441d700(0000) GS:ffff88ad3f700000(0000) knlGS:0000000000000000
[   81.435961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   81.435964] CR2: 00007f9eab130740 CR3: 00000007f66ac000 CR4: 00000000003406e0
[   81.435968] Call Trace:
[   81.435976]  kmem_cache_alloc_trace+0xbb/0x1d0
[   81.436062]  ? dal_ddc_service_create+0x3c/0x120 [amdgpu]
[   81.436139]  dal_ddc_service_create+0x3c/0x120 [amdgpu]
[   81.436221]  ? dal_gpio_service_create_irq+0x44/0x70 [amdgpu]
[   81.436296]  construct+0x249/0x750 [amdgpu]
[   81.436373]  link_create+0x38/0x60 [amdgpu]
[   81.436437]  dc_create+0x2dd/0x670 [amdgpu]
[   81.436488]  dm_hw_init+0xe8/0xe50 [amdgpu]
[   81.436493]  ? vprintk_func+0x27/0x60
[   81.436496]  ? printk+0x52/0x6e
[   81.436537]  amdgpu_device_init+0x13a4/0x1470 [amdgpu]
[   81.436579]  amdgpu_driver_load_kms+0x8b/0x2c0 [amdgpu]
[   81.436594]  drm_dev_register+0x146/0x1d0 [drm]
[   81.436635]  amdgpu_pci_probe+0x13f/0x1f0 [amdgpu]
[   81.436641]  local_pci_probe+0x45/0xa0
[   81.436643]  pci_device_probe+0x109/0x1b0
[   81.436647]  driver_probe_device+0x2b2/0x490
[   81.436650]  __driver_attach+0xdf/0xf0
[   81.436652]  ? driver_probe_device+0x490/0x490
[   81.436656]  bus_for_each_dev+0x64/0xb0
[   81.436658]  ? kmem_cache_alloc_trace+0x1bf/0x1d0
[   81.436661]  driver_attach+0x1e/0x20
[   81.436663]  bus_add_driver+0x170/0x260
[   81.436665]  driver_register+0x60/0xe0
[   81.436668]  ? 0xffffffffc09f0000
[   81.436671]  __pci_register_driver+0x5a/0x60
[   81.436712]  amdgpu_init+0x7a/0x89 [amdgpu]
[   81.436716]  do_one_initcall+0x4f/0x1c4
[   81.436719]  ? __vunmap+0x81/0xb0
[   81.436722]  ? _cond_resched+0x1a/0x50
[   81.436725]  ? kmem_cache_alloc_trace+0xbb/0x1d0
[   81.436728]  ? do_init_module+0x27/0x219
[   81.436731]  do_init_module+0x5f/0x219
[   81.436733]  load_module+0x25b5/0x2dd0
[   81.436737]  __do_sys_finit_module+0xe5/0x120
[   81.436739]  ? __do_sys_finit_module+0xe5/0x120
[   81.436742]  __x64_sys_finit_module+0x1a/0x20
[   81.436745]  do_syscall_64+0x54/0x110
[   81.436749]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   81.436751] RIP: 0033:0x7f1083f60229
[   81.436753] RSP: 002b:00007ffeb5502ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   81.436756] RAX: ffffffffffffffda RBX: 0000559dfae99480 RCX: 00007f1083f60229
[   81.436758] RDX: 0000000000000000 RSI: 0000559df9088638 RDI: 000000000000000d
[   81.436760] RBP: 0000559df9088638 R08: 0000000000000000 R09: 0000000000000000
[   81.436762] R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
[   81.436764] R13: 0000559dfae995b0 R14: 0000000000040000 R15: 0000000000000000
[   81.436767] Code: 58 30 b2 e8 72 f0 e9 ff eb 91 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 85 f6 48 89 e5 74 13 8b 47 20 48 01 c6 <48> 33 36 48 33 b7 38 01 00 00 0f 18 0e 5d c3 0f 1f 00 66 2e 0f 
[   81.436794] RIP: prefetch_freepointer+0x14/0x30 RSP: ffff99ec83ee33d8
[   81.436797] ---[ end trace 7ba07fc735cbae7d ]---
Comment 25 Alex Deucher 2018-05-11 15:23:57 UTC
Does the system boot ok with amdgpu blacklisted?  E.g., append modprobe.blacklist=amdgpu to the kernel command line in grub and boot to a non-GUI runlevel.
Comment 26 jian-hong 2018-05-11 15:51:02 UTC
Yes.  System can boot with modprobe.blacklist=amdgpu & systemd.unit=multi-user.target.

This is the way that I tested and got the dmesg.
I booted with that configuration and got into command line environment.
Than, modprobe efi-pstore which lets the dmesg can be saved into efi storage when system hangs up.
Then, modprobe amdgpu manually.  If system hangs up at that time, I can get the dmesg with error information in efi storage at next boot.
Comment 27 Alex Deucher 2018-05-11 17:51:09 UTC
Can you post your kernel config?  I'm having trouble seeing how these crashes are even possible.
Comment 28 Aaron 2018-05-12 20:30:26 UTC
I am getting this issue on the default kernel configuration for Arch Linux as well as my 4.17 rc build. I'll attach both configs.
Comment 29 Aaron 2018-05-12 20:37:03 UTC
Created attachment 139530 [details]
Arch current default kernel config
Comment 30 jian-hong 2018-05-14 03:00:13 UTC
Created attachment 139549 [details]
config of Linux kernel with freedesktop branch

This is the config for building Linux kernel and get the dmesg of loading amdgpu module with patch 218586.
Comment 31 jian-hong 2018-05-14 03:04:18 UTC
Created attachment 139550 [details]
config of Linux kernel 4.17-rc4

This is the config for building Linux kernel 4.17-rc4 and get the dmesg of loading amdgpu module - tested in 4.17-rc4.
Comment 32 Paul Menzel 2018-07-18 13:47:25 UTC
I am also seeing this with Linux 4.18-rc5+ on a MSI MS-7A37/B350M MORTAR.

```
19.940: [    0.072004] ACPI BIOS Error (bug): Failure creating [\_SB.SMIC], AE_ALREADY_EXISTS (20180531/dswload2-316)
19.939: [    0.080004] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20180531/psobject-221)
19.939: [    0.092000] ACPI Error: Ignore error and continue table load (20180531/psobject-604)
19.939: [    0.096021] ACPI BIOS Error (bug): Failure creating [\_SB.SMIB], AE_ALREADY_EXISTS (20180531/dsfield-594)
20.044: [    0.205797] AMD-Vi: Unable to write to IOMMU perf counter.
20.104: <ff>[    1.077147] ata9.00: failed to set xfermode (err_mask=0x40)
54.879: [   35.052403] sp5100-tco sp5100-tco: Watchdog hardware is disabled
54.924: [   35.104108] Error: Driver 'pcspkr' is already registered, aborting...
55.061: [   35.246972] kvm: disabled by bios
55.077: [   35.265941] kfd kfd: kgd2kfd_probe failed
55.344: [   35.537445] general protection fault: 0000 [#1] SMP NOPTI
55.344: [   35.543209] CPU: 0 PID: 367 Comm: systemd-udevd Not tainted 4.18.0-rc5+ #1
55.345: [   35.550371] Hardware name: MSI MS-7A37/B350M MORTAR (MS-7A37), BIOS 1.G1 05/17/2018
55.345: [   35.558562] RIP: 0010:prefetch_freepointer+0x10/0x20
55.346: [   35.563881] Code: 89 d3 e8 c3 fe 4a 00 85 c0 0f 85 31 75 00 00 48 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 48 85 f6 74 13 8b 47 20 48 01 c6 <48> 33 36 48 33 b7 38 01 00 00 0f 18 0e c3 66 90 0f 1f 44 00 00 48 
55.347: [   35.584215] RSP: 0018:ffff9c3181f77560 EFLAGS: 00010202
55.347: [   35.589849] RAX: 0000000000000000 RBX: 4b2c8be0a60f6ab9 RCX: 0000000000000ccc
55.348: [   35.597492] RDX: 0000000000000ccb RSI: 4b2c8be0a60f6ab9 RDI: ffff8d5b1e406e80
55.349: [   35.605166] RBP: ffff8d5b1e406e80 R08: ffff8d5b1e824f00 R09: ffffffffc0a45423
55.349: [   35.612808] R10: fffff1b60fff64c0 R11: ffff9c3181f77520 R12: 00000000006080c0
55.350: [   35.620451] R13: 0000000000000230 R14: ffff8d5b1e406e80 R15: ffff8d5b0c304400
55.350: [   35.628116] FS:  00007fb194e4b8c0(0000) GS:ffff8d5b1e800000(0000) knlGS:0000000000000000
55.351: [   35.636771] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
55.351: [   35.642931] CR2: 000055c0661d60d0 CR3: 000000040b9ee000 CR4: 00000000003406f0
55.352: [   35.650566] Call Trace:
55.352: [   35.653198]  kmem_cache_alloc_trace+0xb5/0x1c0
55.352: [   35.658077]  ? dal_ddc_service_create+0x38/0x110 [amdgpu]
55.353: [   35.663962]  dal_ddc_service_create+0x38/0x110 [amdgpu]
55.353: [   35.669656]  ? dal_gpio_create_irq+0x19/0x30 [amdgpu]
55.354: [   35.675155]  ? dal_gpio_service_create_irq+0x48/0x70 [amdgpu]
55.354: [   35.681413]  ? get_hpd_gpio+0x63/0x90 [amdgpu]
55.355: [   35.686281]  construct+0x201/0x640 [amdgpu]
55.355: [   35.690838]  link_create+0x33/0x50 [amdgpu]
55.356: [   35.695400]  dc_create+0x2d3/0x640 [amdgpu]
55.356: [   35.699980]  dm_hw_init+0xc8/0x130 [amdgpu]
55.356: [   35.704568]  amdgpu_device_init.cold.28+0x1043/0x11ee [amdgpu]
55.357: [   35.710923]  amdgpu_driver_load_kms+0x86/0x2c0 [amdgpu]
55.357: [   35.716535]  drm_dev_register+0x109/0x140 [drm]
55.358: [   35.721468]  amdgpu_pci_probe+0x13c/0x1c0 [amdgpu]
55.358: [   35.726612]  local_pci_probe+0x41/0x90
55.358: [   35.730628]  pci_device_probe+0x189/0x1a0
55.359: [   35.734952]  driver_probe_device+0x2b9/0x460
55.359: [   35.739544]  __driver_attach+0xdd/0x110
55.359: [   35.743633]  ? driver_probe_device+0x460/0x460
55.360: [   35.748383]  bus_for_each_dev+0x76/0xc0
55.360: [   35.752480]  ? klist_add_tail+0x3b/0x70
55.360: [   35.756574]  bus_add_driver+0x152/0x230
55.361: [   35.760692]  ? 0xffffffffc0c37000
55.361: [   35.764255]  driver_register+0x6b/0xb0
55.361: [   35.768265]  ? 0xffffffffc0c37000
55.361: [   35.771814]  do_one_initcall+0x46/0x1c3
55.362: [   35.775944]  ? _cond_resched+0x15/0x30
55.362: [   35.779940]  ? kmem_cache_alloc_trace+0x15c/0x1c0
55.362: [   35.784982]  ? do_init_module+0x22/0x210
55.362: [   35.789172]  do_init_module+0x5a/0x210
55.363: [   35.793227]  load_module+0x2124/0x2500
55.363: [   35.797250]  ? vfs_read+0x110/0x140
55.363: [   35.800985]  ? __do_sys_finit_module+0xa8/0x110
55.364: [   35.805837]  __do_sys_finit_module+0xa8/0x110
55.364: [   35.810516]  do_syscall_64+0x55/0xe0
55.364: [   35.814358]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
55.365: [   35.819768] RIP: 0033:0x7fb195ea8a79
55.365: [   35.823607] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d df 43 0c 00 f7 d8 64 89 01 48 
55.366: [   35.843819] RSP: 002b:00007ffec10a9518 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
55.367: [   35.851960] RAX: ffffffffffffffda RBX: 00005561d3488730 RCX: 00007fb195ea8a79
55.367: [   35.859595] RDX: 0000000000000000 RSI: 00007fb195bb00ed RDI: 0000000000000013
55.368: [   35.867206] RBP: 00007fb195bb00ed R08: 0000000000000000 R09: 0000000000000000
55.368: [   35.874824] R10: 0000000000000013 R11: 0000000000000246 R12: 0000000000000000
55.369: [   35.882477] R13: 00005561d3478500 R14: 0000000000020000 R15: 00005561d3488730
55.370: [   35.890099] Modules linked in: amdkfd edac_mce_amd nls_ascii ppdev ccp wmi_bmof rng_core nls_cp437 vfat fat kvm amdgpu(+) irqbypass snd_hda_codec_realtek snd_hda_codec_generic chash gpu_sched snd_hda_codec_hdmi ttm efi_pstore crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel snd_hda_intel snd_hda_codec drm pcspkr efivars snd_hda_core sp5100_tco k10temp snd_hwdep snd_pcm r8169 snd_timer snd sg i2c_piix4 soundcore i2c_algo_bit mii parport_pc wmi parport video button acpi_cpufreq efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto dm_crypt dm_mod raid10 raid456 libcrc32c crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear md_mod sd_mod evdev hid_generic usbhid hid crc32c_intel aesni_intel aes_x86_64 crypto_simd
55.375: [   35.966200]  xhci_pci cryptd glue_helper ahci libahci xhci_hcd libata usbcore scsi_mod gpio_amdpt gpio_generic
55.375: [   35.976975] ---[ end trace a9eb7c09b06a0207 ]---
```
Comment 33 Paul Menzel 2018-07-19 17:27:14 UTC
With Linux 4.18-rc5+ and merged drm-tip, the general protection fault happens at a different point from `initialize_plane`.

```
15.389: [   24.739453] [drm] Display Core initialized with v3.1.52!
15.389: [   24.746057] general protection fault: 0000 [#1] SMP NOPTI
15.390: [   24.751824] CPU: 0 PID: 370 Comm: systemd-udevd Tainted: G        W         4.18.0-rc5+ #2
15.390: [   24.751825] Hardware name: MSI MS-7A37/B350M MORTAR (MS-7A37), BIOS 1.G1 05/17/2018
15.391: [   24.751829] RIP: 0010:prefetch_freepointer+0x10/0x20
15.391: [   24.774361] Code: 89 d3 e8 c3 53 4b 00 85 c0 0f 85 71 78 00 00 48 83 c4 08 5b 5d 41 5c 41 5d c3 0f 1f 44 00 00 48 85 f6 74 13 8b 47 20 48 01 c6 <48> 33 36 48 33 b7 38 01 00 00 0f 18 0e c3 66 90 0f 1f 44 00 00 48 
15.393: [   24.794677] RSP: 0018:ffffa31f0259f900 EFLAGS: 00010202
15.393: [   24.794678] RAX: 0000000000000000 RBX: 18b43346a3036634 RCX: 00000000000007fc
15.394: [   24.794678] RDX: 00000000000007fb RSI: 18b43346a3036634 RDI: ffff97d5de406e80
15.394: [   24.794680] RBP: ffff97d5ca9b3400 R08: ffff97d5de824f00 R09: ffffffffb273adb0
15.395: [   24.823335] R10: ffffffffb273adac R11: 0000000000000007 R12: 00000000006080c0
15.395: [   24.823335] R13: ffff97d5de406e80 R14: 0000000000000290 R15: ffff97d5de406e80
15.396: [   24.823336] FS:  00007fd442ec88c0(0000) GS:ffff97d5de800000(0000) knlGS:0000000000000000
15.396: [   24.823338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
15.397: [   24.853319] CR2: 00007fd34c0034c8 CR3: 0000000403e6c000 CR4: 00000000003406f0
15.397: [   24.853320] Call Trace:
15.397: [   24.853323]  kmem_cache_alloc_trace+0xb5/0x1f0
15.398: [   24.868339]  ? initialize_plane+0x27/0x97 [amdgpu]
15.398: [   24.868380]  initialize_plane+0x27/0x97 [amdgpu]
15.399: [   24.878524]  amdgpu_dm_initialize_drm_device+0x149/0xb34 [amdgpu]
[…]
```
Comment 34 Paul Menzel 2018-07-20 14:57:25 UTC
Booting with `systemd.unit=multi-user.target`, that means GDM is not started, amdgpu doesn’t crash.

[   14.975926] [drm] Found VCN firmware Version: 1.73 Family ID: 18
[   15.144612] amdgpu: [powerplay] dpm has been enabled
[   15.150055] [drm] DM_PPLIB: values for Invalid clock
[   15.155237] [drm] DM_PPLIB:	 0 in kHz
[   15.159053] [drm] DM_PPLIB:	 400000 in kHz
[   15.163319] [drm] DM_PPLIB:	 933000 in kHz
[   15.167552] [drm] DM_PPLIB:	 1067000 in kHz
[   15.171934] WARNING: CPU: 2 PID: 360 at drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1355 dcn_bw_update_from_pplib+0x16b/0x280 [amdgpu]
[   15.185819] Modules linked in: edac_mce_amd kvm_amd nls_ascii ccp nls_cp437 amdkfd rng_core kvm vfat snd_hda_codec_realtek fat irqbypass amdgpu(+) snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel crct10dif_pclmul chash snd_hda_codec i2c_algo_bit gpu_sched crc32_pclmul drm_kms_helper ghash_clmulni_intel efi_pstore snd_hda_core uas syscopyarea snd_hwdep sysfillrect sysimgblt fb_sys_fops snd_pcm snd_timer usb_storage ttm efivars pcspkr sp5100_tco r8169 snd sg soundcore mii drm k10temp i2c_piix4 video button efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt dm_mod sd_mod evdev hid_generic usbhid hid crc32c_intel ahci xhci_pci libahci aesni_intel aes_x86_64 xhci_hcd crypto_simd libata cryptd glue_helper usbcore scsi_mod gpio_amdpt gpio_generic
[   15.259960] CPU: 2 PID: 360 Comm: systemd-udevd Not tainted 4.18.0-rc5+ #3
[   15.267349] Hardware name: MSI MS-7A37/B350M MORTAR (MS-7A37), BIOS 1.G1 05/17/2018
[   15.275647] RIP: 0010:dcn_bw_update_from_pplib+0x16b/0x280 [amdgpu]
[   15.282360] Code: d8 ca d8 f1 d9 5a 50 8b 44 fc 14 49 8b 94 24 70 01 00 00 48 89 04 24 df 2c 24 d8 f1 db 42 78 de c9 de ca de f9 d9 5a 4c eb 02 <0f> 0b 48 89 da be 04 00 00 00 48 89 ef e8 33 5a fe ff 84 c0 0f 84 
[   15.302582] RSP: 0018:ffffa6ce81fa77c8 EFLAGS: 00010246
[   15.308188] RAX: 0000000000000001 RBX: ffffa6ce81fa7828 RCX: 0000000000000000
[   15.315825] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9530de896738
[   15.323477] RBP: ffff9530c3014540 R08: 00000000000008d8 R09: 0000000000000000
[   15.331106] R10: 0000000000000002 R11: 000000000000000f R12: ffff9530c2f29000
[   15.338734] R13: ffff9530ca9bb4c0 R14: ffff9530c2f29000 R15: 0000000000000000
[   15.346383] FS:  00007fc5b86178c0(0000) GS:ffff9530de880000(0000) knlGS:0000000000000000
[   15.355050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   15.361199] CR2: 00007fe4fb3affe8 CR3: 000000040b0de000 CR4: 00000000003406e0
[   15.368848] Call Trace:
[   15.371538]  dcn10_create_resource_pool+0x75b/0x9a0 [amdgpu]
[   15.377658]  dc_create_resource_pool+0x42/0x180 [amdgpu]
[   15.383360]  ? __kmalloc+0x1b4/0x250
[   15.387260]  ? dal_gpio_service_create+0x8f/0x110 [amdgpu]
[   15.393203]  dc_create+0x228/0x650 [amdgpu]
[   15.397734]  ? amdgpu_cgs_create_device+0x23/0x50 [amdgpu]
[   15.403662]  dm_hw_init+0xc8/0x130 [amdgpu]
[   15.408199]  amdgpu_device_init.cold.28+0x10ea/0x1295 [amdgpu]
[   15.414489]  amdgpu_driver_load_kms+0x86/0x2c0 [amdgpu]
[   15.420085]  drm_dev_register+0x109/0x140 [drm]
[   15.424990]  amdgpu_pci_probe+0x13c/0x1c0 [amdgpu]
[   15.430150]  local_pci_probe+0x41/0x90
[   15.434194]  pci_device_probe+0x189/0x1a0
[   15.438517]  driver_probe_device+0x2b9/0x460
[   15.443091]  __driver_attach+0xdd/0x110
[   15.447220]  ? driver_probe_device+0x460/0x460
[   15.451987]  bus_for_each_dev+0x76/0xc0
[   15.456116]  ? klist_add_tail+0x3b/0x70
[   15.460234]  bus_add_driver+0x152/0x230
[   15.464353]  ? 0xffffffffc09cd000
[   15.467905]  driver_register+0x6b/0xb0
[   15.471926]  ? 0xffffffffc09cd000
[   15.475484]  do_one_initcall+0x46/0x1c3
[   15.479588]  ? kmem_cache_alloc_trace+0x183/0x1f0
[   15.484654]  ? do_init_module+0x22/0x210
[   15.488885]  do_init_module+0x5a/0x210
[   15.492911]  load_module+0x21c4/0x2410
[   15.496935]  ? vfs_read+0x110/0x140
[   15.500678]  ? __do_sys_finit_module+0xa8/0x110
[   15.505534]  __do_sys_finit_module+0xa8/0x110
[   15.510196]  do_syscall_64+0x55/0xe0
[   15.514050]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   15.519450] RIP: 0033:0x7fc5b9674a79
[   15.523286] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d df 43 0c 00 f7 d8 64 89 01 48 
[   15.543529] RSP: 002b:00007fffa4622ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   15.551649] RAX: ffffffffffffffda RBX: 000055663e956cc0 RCX: 00007fc5b9674a79
[   15.559309] RDX: 0000000000000000 RSI: 00007fc5b937c0ed RDI: 0000000000000018
[   15.566963] RBP: 00007fc5b937c0ed R08: 0000000000000000 R09: 0000000000000000
[   15.574611] R10: 0000000000000018 R11: 0000000000000246 R12: 0000000000000000
[   15.582257] R13: 000055663e936990 R14: 0000000000020000 R15: 000055663e956cc0
[   15.589979] WARNING: CPU: 2 PID: 360 at drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1355 dcn_bw_update_from_pplib+0x16b/0x280 [amdgpu]
[   15.604311] ---[ end trace a12997ffe6f8dea6 ]---
[   15.609258] [drm] DM_PPLIB: values for Invalid clock
[   15.614633] [drm] DM_PPLIB:	 300000 in kHz
[   15.619072] [drm] DM_PPLIB:	 600000 in kHz
[   15.623518] [drm] DM_PPLIB:	 626000 in kHz
[   15.627989] [drm] DM_PPLIB:	 654000 in kHz
[   15.635929] [drm] Display Core initialized with v3.1.52!
[   15.663219] [drm] SADs count is: -2, don't need to read it
[   15.679318] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   15.686300] [drm] Driver supports precise vblank timestamp query.
[   15.710275] [drm] VCN decode and encode initialized successfully.
[   15.729477] [drm] fb mappable at 0xA1100000
[   15.733926] [drm] vram apper at 0xA0000000
[   15.738182] [drm] size 9216000
[   15.741407] [drm] fb depth is 24
[   15.744783] [drm]    pitch is 7680
[   15.748631] fbcon: amdgpudrmfb (fb0) is primary device
[   15.770534] Console: switching to colour frame buffer device 240x75
[   15.804032] amdgpu 0000:38:00.0: fb0: amdgpudrmfb frame buffer device
[   15.830137] amdgpu 0000:38:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[   15.837277] amdgpu 0000:38:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0
[   15.844955] amdgpu 0000:38:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0
[   15.844956] amdgpu 0000:38:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0
[   15.844957] amdgpu 0000:38:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0
[   15.844958] amdgpu 0000:38:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0
[   15.844959] amdgpu 0000:38:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0
[   15.844960] amdgpu 0000:38:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0
[   15.844961] amdgpu 0000:38:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0
[   15.844962] amdgpu 0000:38:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0
[   15.844963] amdgpu 0000:38:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[   15.844964] amdgpu 0000:38:00.0: ring 11(vcn_dec) uses VM inv eng 5 on hub 1
[   15.844965] amdgpu 0000:38:00.0: ring 12(vcn_enc0) uses VM inv eng 6 on hub 1
[   15.844966] amdgpu 0000:38:00.0: ring 13(vcn_enc1) uses VM inv eng 7 on hub 1
[   15.844967] amdgpu 0000:38:00.0: ring 14(vcn_jpeg) uses VM inv eng 8 on hub 1
[   15.855693] [drm] Initialized amdgpu 3.26.0 20150101 for 0000:38:00.0 on minor 0
[   15.951219] initcall amdgpu_init+0x0/0x86 [amdgpu] returned 0 after 954562 usecs

The trace is present in drm-tip (not in Linux master) and I reported it already in bug 107296 [1].

[1]: https://bugs.freedesktop.org/show_bug.cgi?id=107296
Comment 35 Paul Menzel 2018-07-23 11:33:18 UTC
Jian, I am curious, did you find a way how to use the system? Did you return it or put it in a corner?
Comment 36 jian-hong 2018-07-24 09:27:51 UTC
Created attachment 140806 [details]
dmesg of 4.18.0-rc6

Tested with Linux 4.18.0-rc6 kernel.
It seems the amdgpu module could be loaded correctly. I tried over 20 times.

The attachment is the full dmesg.
Comment 37 jian-hong 2018-07-25 10:10:29 UTC
Thanks
Comment 38 Alex Deucher 2018-07-25 14:13:25 UTC
(In reply to jian-hong from comment #36)
> Created attachment 140806 [details]
> dmesg of 4.18.0-rc6
> 
> Tested with Linux 4.18.0-rc6 kernel.
> It seems the amdgpu module could be loaded correctly. I tried over 20 times.

Could you bisect to see what change fixed it?
Comment 39 Paul Menzel 2018-07-25 16:38:18 UTC
(In reply to jian-hong from comment #36)
> Created attachment 140806 [details]
> dmesg of 4.18.0-rc6
> 
> Tested with Linux 4.18.0-rc6 kernel.
> It seems the amdgpu module could be loaded correctly. I tried over 20 times.

I am still hitting this issue, though less often with 4.18-rc6. Could you please attach your configuration, your build instructions? It seems independent from the compiler (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1endless2bem1)) as I use GCC 8.1.0 from Debian Sid/unstable.
Comment 40 jian-hong 2018-07-26 08:21:33 UTC
Created attachment 140819 [details]
4.18.0-rc6 Build config file

Hi Paul,

The config file is as attachment.

I just "make", and "make modules_install install" with gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1endless2bem1).
Comment 41 Vlastimil Babka 2018-08-09 07:24:21 UTC
(In reply to jian-hong from comment #40)
> Created attachment 140819 [details]
> 4.18.0-rc6 Build config file

Paul Menzel posted this bug to linux-mm mailing list and I've noticed that all bug splats have code that indicates CONFIG_SLAB_FREELIST_HARDENED is enabled. Your latest .config above has it disabled and exhibits no bug, making it very suspicious. Possibly also the related CONFIG_SLAB_FREELIST_RANDOM. Can you try if enabling either (or both) back reintroduces the bug? Or others if disabling fixes it? Thanks.
Comment 42 jian-hong 2018-08-09 09:56:55 UTC
(In reply to Vlastimil Babka from comment #41)
> (In reply to jian-hong from comment #40)
> > Created attachment 140819 [details]
> > 4.18.0-rc6 Build config file
> 
> Paul Menzel posted this bug to linux-mm mailing list and I've noticed that
> all bug splats have code that indicates CONFIG_SLAB_FREELIST_HARDENED is
> enabled. Your latest .config above has it disabled and exhibits no bug,
> making it very suspicious. Possibly also the related
> CONFIG_SLAB_FREELIST_RANDOM. Can you try if enabling either (or both) back
> reintroduces the bug? Or others if disabling fixes it? Thanks.

Uh!  We just returned the TC-380 desktop back to ODM yesterday.  So, we do not have the target right now.
Comment 43 Paul Menzel 2018-08-09 12:21:04 UTC
I just tried with Linux master, and the page fault didn’t happen anymore.

    /boot/config-4.18.0-rc8-00004-gfedb8da96355:CONFIG_SLAB_FREELIST_HARDENED=y

Out of curiosity, I’ll try to bisect this.
Comment 44 Jörn Frenzel 2019-04-29 15:24:25 UTC
Created attachment 144109 [details]
Oops on debian kernel 4.19.0-0.bpo.4-amd64

Hi,

the issue seems to persist in Debian 9 with kernel 4.19.0-0.bpo.4-amd64. My hardware:

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: HP
        Product Name: HP t530 Thin Client
        Version:  

00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] [1002:98e4] (rev 83)
        Subsystem: Hewlett-Packard Company Stoney [Radeon R2/R3/R4/R5 Graphics] [103c:8267]
        Kernel modules: amdgpu

Any new ideas on this?

Regards, Jörn
Comment 45 Michel Dänzer 2019-04-29 15:53:19 UTC
(In reply to Jörn Frenzel from comment #44)
> the issue seems to persist in Debian 9 with kernel 4.19.0-0.bpo.4-amd64.

The dmesg you attached looks like a different issue, please file your own report.
Comment 46 Jörn Frenzel 2019-05-29 13:36:07 UTC
Hi Michel,

sorry for the long delay.

What is the reason for your opinion? It really seems like the same issue for me, but may be i'm wrong.

Same thing with manjaro linux and kernel 4.19.x - oops.

We build a debian kernel 4.19.37 (jessie backport kernel bpo.5) with CONFIG_SLAB_FREELIST_HARDENED=n (see Comment 41, Vlastimil Babka, 2018-08-09). But the oops still occours.

For us this seems to be a generel problem.

Regards, Jörn
Comment 47 Jörn Frenzel 2019-06-05 12:39:16 UTC
Hi,

the problem seems to be gone in newer kernel version. I just tested a SUSE tumbleweed with kernel 5.1.5-1 . 

This is a satisfactory answer for me.

Regards, Jörn
Comment 48 Martin Peres 2019-11-19 08:33:13 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/327.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.