Bug 104274 - Unable to cleanly unload kernel module: BUG: unable to handle kernel NULL pointer dereference at 0000000000000258 (mutex_lock)
Summary: Unable to cleanly unload kernel module: BUG: unable to handle kernel NULL poi...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-15 03:01 UTC by Sverd Johnsen
Modified: 2019-11-19 08:27 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Sverd Johnsen 2017-12-15 03:01:50 UTC
Use case for this working well is that once GPU is not needed anymore it can be moved into VM with VFIO. Moving from VFIO back to amdgpu already seems to work alright.

loading:

[46666.751628] kernel: LoadPin: kernel-module pinning-ignored obj="/usr/lib/modules/4.14.5-5-ph/kernel/drivers/gpu/drm/ttm/ttm.ko" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46666.769592] kernel: LoadPin: kernel-module pinning-ignored obj="/usr/lib/modules/4.14.5-5-ph/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.037887] kernel: [drm] amdgpu kernel modesetting enabled.
[46667.037938] kernel: amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
[46667.038049] kernel: [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1462:0x809D 0xCF).
[46667.038079] kernel: [drm] register mmio base: 0xEFE00000
[46667.038079] kernel: [drm] register mmio size: 262144
[46667.038087] kernel: [drm] probing gen 2 caps for device 8086:1901 = 261ad03/e
[46667.038087] kernel: [drm] probing mlw for device 8086:1901 = 261ad03
[46667.038093] kernel: [drm] UVD is enabled in VM mode
[46667.038094] kernel: [drm] VCE enabled in VM mode
[46667.038114] kernel: ATOM BIOS: 113-C99401-S01
[46667.038119] kernel: [drm] GPU post is not needed
[46667.038184] kernel: [drm] vm size is 64 GB, block size is 13-bit, fragment size is 4-bit
[46667.038205] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_mc.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.038643] kernel: amdgpu 0000:01:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[46667.038644] kernel: amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[46667.038649] kernel: [drm] Detected VRAM RAM=2048M, BAR=256M
[46667.038649] kernel: [drm] RAM width 128bits GDDR5
[46667.039152] kernel: [TTM] Zone  kernel: Available graphics memory: 8082768 kiB
[46667.039152] kernel: [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[46667.039153] kernel: [TTM] Initializing pool allocator
[46667.039155] kernel: [TTM] Initializing DMA pool allocator
[46667.039167] kernel: [drm] amdgpu: 2048M of VRAM memory ready
[46667.039168] kernel: [drm] amdgpu: 3072M of GTT memory ready.
[46667.039202] kernel: [drm] GART: num cpu pages 65536, num gpu pages 65536
[46667.039284] kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
[46667.039300] kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[46667.039301] kernel: [drm] Driver supports precise vblank timestamp query.
[46667.039338] kernel: amdgpu 0000:01:00.0: amdgpu: using MSI.
[46667.039348] kernel: [drm] amdgpu: irq initialized.
[46667.168524] kernel: amdgpu: [powerplay] amdgpu: powerplay sw initialized
[46667.168612] kernel: [drm] AMDGPU Display Connectors
[46667.168612] kernel: [drm] Connector 0:
[46667.168613] kernel: [drm]   DP-2
[46667.168613] kernel: [drm]   HPD5
[46667.168614] kernel: [drm]   DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b
[46667.168614] kernel: [drm]   Encoders:
[46667.168614] kernel: [drm]     DFP1: INTERNAL_UNIPHY1
[46667.168614] kernel: [drm] Connector 1:
[46667.168615] kernel: [drm]   HDMI-A-4
[46667.168615] kernel: [drm]   HPD3
[46667.168615] kernel: [drm]   DDC: 0x4874 0x4874 0x4875 0x4875 0x4876 0x4876 0x4877 0x4877
[46667.168615] kernel: [drm]   Encoders:
[46667.168616] kernel: [drm]     DFP2: INTERNAL_UNIPHY1
[46667.168616] kernel: [drm] Connector 2:
[46667.168616] kernel: [drm]   DVI-D-1
[46667.168616] kernel: [drm]   HPD4
[46667.168617] kernel: [drm]   DDC: 0x4878 0x4878 0x4879 0x4879 0x487a 0x487a 0x487b 0x487b
[46667.168617] kernel: [drm]   Encoders:
[46667.168617] kernel: [drm]     DFP3: INTERNAL_UNIPHY
[46667.168631] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_pfp.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.168927] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_me.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.169170] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_ce.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.169297] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_rlc.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.169543] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_mec.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.170386] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_mec2.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.171247] kernel: amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0xffffa8df02931040
[46667.171271] kernel: amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0xffffa8df029310c0
[46667.171286] kernel: amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0xffffa8df02931140
[46667.171301] kernel: amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0xffffa8df029311c0
[46667.171321] kernel: amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0xffffa8df02931240
[46667.171332] kernel: amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0xffffa8df029312c0
[46667.171348] kernel: amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0xffffa8df02931340
[46667.171363] kernel: amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0xffffa8df029313c0
[46667.171379] kernel: amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0xffffa8df02931440
[46667.171392] kernel: amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0xffffa8df029314e0
[46667.171674] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_sdma.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.171886] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_sdma1.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.172092] kernel: amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0xffffa8df02931560
[46667.172112] kernel: amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0xffffa8df029315e0
[46667.172124] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_uvd.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.173484] kernel: [drm] Found UVD firmware Version: 1.79 Family ID: 16
[46667.173796] kernel: amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x000000f4002ad420, cpu addr 0xffffa8df07a5a420
[46667.173807] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_vce.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.174451] kernel: [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[46667.174510] kernel: amdgpu 0000:01:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0xffffa8df029316e0
[46667.174523] kernel: amdgpu 0000:01:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0xffffa8df02931760
[46667.174668] kernel: LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_smc.bin" pid=940 cmdline="modprobe amdgpu disp_priority=1"
[46667.236028] kernel: amdgpu: [powerplay] 
                        failed to send message 309 ret is 254 
[46667.236041] kernel: amdgpu: [powerplay] 
                        failed to send pre message 14e ret is 254 
[46667.246260] kernel: [drm] ring test on 0 succeeded in 18 usecs
[46667.246804] kernel: [drm] ring test on 9 succeeded in 12 usecs
[46667.246823] kernel: [drm] ring test on 1 succeeded in 9 usecs
[46667.246875] kernel: [drm] ring test on 2 succeeded in 26 usecs
[46667.246913] kernel: [drm] ring test on 3 succeeded in 19 usecs
[46667.246950] kernel: [drm] ring test on 4 succeeded in 18 usecs
[46667.246988] kernel: [drm] ring test on 5 succeeded in 19 usecs
[46667.247026] kernel: [drm] ring test on 6 succeeded in 19 usecs
[46667.247064] kernel: [drm] ring test on 7 succeeded in 19 usecs
[46667.247103] kernel: [drm] ring test on 8 succeeded in 19 usecs
[46667.247152] kernel: [drm] ring test on 10 succeeded in 7 usecs
[46667.247160] kernel: [drm] ring test on 11 succeeded in 7 usecs
[46667.293413] kernel: [drm] ring test on 12 succeeded in 1 usecs
[46667.293414] kernel: [drm] UVD initialized successfully.
[46667.404389] kernel: [drm] ring test on 13 succeeded in 7 usecs
[46667.404399] kernel: [drm] ring test on 14 succeeded in 3 usecs
[46667.404399] kernel: [drm] VCE initialized successfully.
[46667.404621] kernel: [drm] ib test on ring 0 succeeded
[46667.404787] kernel: [drm] ib test on ring 1 succeeded
[46667.404860] kernel: [drm] ib test on ring 2 succeeded
[46667.404920] kernel: [drm] ib test on ring 3 succeeded
[46667.404979] kernel: [drm] ib test on ring 4 succeeded
[46667.405039] kernel: [drm] ib test on ring 5 succeeded
[46667.405099] kernel: [drm] ib test on ring 6 succeeded
[46667.405159] kernel: [drm] ib test on ring 7 succeeded
[46667.405218] kernel: [drm] ib test on ring 8 succeeded
[46667.912359] kernel: [drm] ib test on ring 9 succeeded
[46667.912391] kernel: [drm] ib test on ring 10 succeeded
[46667.912414] kernel: [drm] ib test on ring 11 succeeded
[46667.914185] kernel: [drm] ib test on ring 12 succeeded
[46667.914409] kernel: [drm] ib test on ring 13 succeeded
[46667.942573] kernel: [drm] Cannot find any crtc or sizes
[46667.949202] kernel: [drm] Initialized amdgpu 3.19.0 20150101 for 0000:01:00.0 on minor 1
[46667.974151] kernel: [drm] Cannot find any crtc or sizes
[46667.999563] kernel: [drm] Cannot find any crtc or sizes




rmmod:




[46674.591081] kernel: [drm] amdgpu: finishing device.
[46674.591298] kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000258
[46674.591304] kernel: IP: mutex_lock+0xb/0x20
[46674.591305] kernel: PGD 3142bd067 P4D 3142bd067 PUD 224322067 PMD 0 
[46674.591307] kernel: Oops: 0002 [#1] PREEMPT SMP
[46674.591308] kernel: Modules linked in: amdgpu(-) ttm tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag msr zram zsmalloc bonding cls_u32 sch_htb af_packet nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user nft_counter xfrm_algo nft_meta n
[46674.591344] kernel:  snd_hwdep intel_rapl_perf snd_hda_core mei_me plusb efi_pstore snd_pcm usbnet mei input_leds mii efivars led_class tpm_crb usbip_host usbip_core efivarfs algif_skcipher af_alg joydev mousedev psmouse atkbd libps2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr tpm_tis shpchp tpm_tis_core thermal fan tpm i8042 battery acpi_pad vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[46674.591360] kernel: CPU: 1 PID: 1008 Comm: rmmod Tainted: G        W       4.14.5-5-ph #2
[46674.591360] kernel: Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[46674.591361] kernel: task: ffff9ae185286c00 task.stack: ffffa8df028a8000
[46674.591362] kernel: RIP: 0010:mutex_lock+0xb/0x20
[46674.591363] kernel: RSP: 0018:ffffa8df028abd98 EFLAGS: 00010246
[46674.591364] kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000180100005
[46674.591364] kernel: RDX: ffff9ae185286c00 RSI: ffff9ae064dd7b20 RDI: 0000000000000258
[46674.591365] kernel: RBP: 0000000000000258 R08: 0000000000000001 R09: ffff9ae064dd6e00
[46674.591365] kernel: R10: ffff9ae24cc01600 R11: ffff9ae17122abf0 R12: ffff9ae064dd7b20
[46674.591366] kernel: R13: ffffffffc089c2b0 R14: ffff9ae24cc25100 R15: 0000000001b0c260
[46674.591366] kernel: FS:  00007fd47a4a5b80(0000) GS:ffff9ae25ec80000(0000) knlGS:0000000000000000
[46674.591367] kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46674.591368] kernel: CR2: 0000000000000258 CR3: 00000001f6fd6001 CR4: 00000000003606e0
[46674.591368] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[46674.591369] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[46674.591369] kernel: Call Trace:
[46674.591373] kernel:  drm_mode_object_unregister+0x19/0x50
[46674.591391] kernel:  amdgpu_fbdev_fini+0x4c/0x70 [amdgpu]
[46674.591398] kernel:  amdgpu_device_fini+0x42/0x190 [amdgpu]
[46674.591402] kernel:  amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[46674.591405] kernel:  drm_dev_unregister+0x37/0xe0
[46674.591410] kernel:  amdgpu_pci_remove+0x10/0x20 [amdgpu]
[46674.591412] kernel:  pci_device_remove+0x31/0xa0
[46674.591414] kernel:  device_release_driver_internal+0x152/0x210
[46674.591416] kernel:  driver_detach+0x32/0x70
[46674.591417] kernel:  bus_remove_driver+0x4c/0xc0
[46674.591419] kernel:  pci_unregister_driver+0x24/0x90
[46674.591429] kernel:  amdgpu_exit+0x11/0x2fa [amdgpu]
[46674.591432] kernel:  SyS_delete_module+0x19a/0x230
[46674.591434] kernel:  do_syscall_64+0x49/0x100
[46674.591436] kernel:  entry_SYSCALL64_slow_path+0x25/0x25
[46674.591437] kernel: RIP: 0033:0x7fd479bb2b87
[46674.591438] kernel: RSP: 002b:00007ffdab7c1248 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[46674.591439] kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd479bb2b87
[46674.591439] kernel: RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000001b0c8a8
[46674.591440] kernel: RBP: 0000000001b0c840 R08: 00007ffdab7c01c1 R09: 0000000000000000
[46674.591440] kernel: R10: 00000000000008b2 R11: 0000000000000206 R12: 00007ffdab7c176a
[46674.591441] kernel: R13: 0000000000000000 R14: 0000000001b0c840 R15: 0000000001b0c260
[46674.591442] kernel: Code: 84 81 fd ff ff eb 87 e8 44 5a 88 ff 0f 1f 40 00 be 02 00 00 00 e9 f6 fa ff ff 66 0f 1f 44 00 00 65 48 8b 14 25 00 c4 00 00 31 c0 <f0> 48 0f b1 17 48 85 c0 75 02 f3 c3 eb d7 0f 1f 80 00 00 00 00 
[46674.591455] kernel: RIP: mutex_lock+0xb/0x20 RSP: ffffa8df028abd98
[46674.591456] kernel: CR2: 0000000000000258
[46674.591457] kernel: ---[ end trace 173a3ed54eae9b36 ]---

(same session)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/560] (rev cf) (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Baffin [Radeon RX 460/560D / Pro 450/455/460/560]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 130
	Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=2M]
	Region 4: I/O ports at e000 [size=256]
	Region 5: Memory at efe00000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at efe40000 [disabled] [size=128K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <1us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Not Supported
			 AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00358  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [200 v1] #15
	Capabilities: [270 v1] #19
	Capabilities: [2b0 v1] Address Translation Service (ATS)
		ATSCap:	Invalidate Queue Depth: 00
		ATSCtl:	Enable-, Smallest Translation Unit: 00
	Capabilities: [2c0 v1] Page Request Interface (PRI)
		PRICtl: Enable- Reset-
		PRISta: RF- UPRGI- Stopped+
		Page Request Capacity: 00000020, Page Request Allocation: 00000000
	Capabilities: [2d0 v1] Process Address Space ID (PASID)
		PASIDCap: Exec+ Priv+, Max PASID Width: 10
		PASIDCtl: Enable- Exec- Priv-
	Capabilities: [320 v1] Latency Tolerance Reporting
		Max snoop latency: 71680ns
		Max no snoop latency: 71680ns
	Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 1
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [370 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=0us PortTPowerOnTime=170us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Kernel modules: amdgpu
Comment 1 Sverd Johnsen 2017-12-15 04:27:07 UTC
with 4.15rc (linus tree from today)

[   78.807441] [drm] amdgpu: finishing device.
[   78.887439] amdgpu: [powerplay] 
[   78.887454] amdgpu: [powerplay] 
[   78.968349] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   78.968352] IP:           (null)
[   78.968352] PGD 45af1d067 P4D 45af1d067 PUD 45b065067 PMD 0 
[   78.968354] Oops: 0010 [#1] PREEMPT SMP
[   78.968355] Modules linked in: amdgpu(-) chash ttm cls_u32 sch_htb af_packet nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_netdev nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio libcrc32c bcache x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net raid0 raid1 tun vhost tap kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec md_mod snd_hwdep irqbypass snd_hda_core intel_cstate mei_me efi_pstore intel_uncore plusb mei input_leds usbnet snd_pcm intel_rapl_perf
[   78.968375]  led_class mii efivars tpm_crb usbip_host usbip_core efivarfs algif_skcipher af_alg mousedev joydev psmouse atkbd libps2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr tpm_tis shpchp tpm_tis_core thermal fan tpm i8042 acpi_pad vfat fat
[   78.968383] CPU: 1 PID: 1493 Comm: rmmod Not tainted 4.15.0-1-rc #2
[   78.968384] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[   78.968384] RIP: 0010:          (null)
[   78.968385] RSP: 0018:ffff9c6a8281bd00 EFLAGS: 00010286
[   78.968386] RAX: ffff9294612676e0 RBX: ffff92948b2a3300 RCX: 000000018020001e
[   78.968386] RDX: 000000018020001f RSI: 0000000000005c02 RDI: ffff929461267120
[   78.968387] RBP: ffff9294658a6c90 R08: 0000000000000001 R09: ffff929476247000
[   78.968387] R10: 0000000000000000 R11: 0000000000000033 R12: 0000000000000003
[   78.968388] R13: ffff929453e42f18 R14: 0000000000000000 R15: ffffffffc09b13e8
[   78.968389] FS:  00007f0d7f498b80(0000) GS:ffff92949ec80000(0000) knlGS:0000000000000000
[   78.968389] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   78.968390] CR2: 0000000000000000 CR3: 000000046b3f8001 CR4: 00000000003606e0
[   78.968390] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   78.968391] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   78.968391] Call Trace:
[   78.968429]  ? destroy+0x1d/0xa0 [amdgpu]
[   78.968440]  ? dal_i2caux_destruct+0x58/0x90 [amdgpu]
[   78.968449]  ? destroy+0x10/0x30 [amdgpu]
[   78.968458]  ? dal_i2caux_destroy+0x17/0x30 [amdgpu]
[   78.968467]  ? destruct+0x89/0x110 [amdgpu]
[   78.968476]  ? dc_destroy+0xc/0x20 [amdgpu]
[   78.968488]  ? dm_hw_fini+0x19/0x20 [amdgpu]
[   78.968493]  ? amdgpu_fini+0x90/0x2f0 [amdgpu]
[   78.968499]  ? amdgpu_device_fini+0x5f/0x1c0 [amdgpu]
[   78.968505]  ? amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[   78.968507]  ? drm_dev_unregister+0x37/0xe0
[   78.968512]  ? amdgpu_pci_remove+0x14/0x40 [amdgpu]
[   78.968514]  ? pci_device_remove+0x31/0xa0
[   78.968516]  ? device_release_driver_internal+0x152/0x210
[   78.968517]  ? driver_detach+0x32/0x70
[   78.968518]  ? bus_remove_driver+0x4c/0xc0
[   78.968519]  ? pci_unregister_driver+0x25/0xa0
[   78.968529]  ? amdgpu_exit+0x11/0x3b6 [amdgpu]
[   78.968530]  ? SyS_delete_module+0x19c/0x2a0
[   78.968532]  ? do_syscall_64+0x48/0xe0
[   78.968533]  ? entry_SYSCALL64_slow_path+0x25/0x25
[   78.968534] Code:  Bad RIP value.
[   78.968536] RIP:           (null) RSP: ffff9c6a8281bd00
[   78.968537] CR2: 0000000000000000
[   78.968538] ---[ end trace bbf73cfe467dd4c8 ]---
Comment 2 Michel Dänzer 2017-12-15 09:17:45 UTC
The problem in the original report was fixed by https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a072c5f896beba806b4b867d478e1b90f94ba29b .

Comment 1 looks like a new issue in DC. Looks like this might be related to the core DRM code now only initializing fbdev compatibility when a display connection is detected, like the change above.

Sverd, until the latter is fixed, maybe you can try if amdgpu.dc=0 helps.
Comment 3 Sverd Johnsen 2017-12-15 15:59:43 UTC
Thanks for pointing me to that. Seems like it should have gone to stable

Alright. Here is what I do now:

boot with intel gpu, amdgpu is blacklisted in modules so it doesn't get autoloaded
then i load amdgpu with dc on or off by hand, this time i make sure display that is attached via DisplayPort to amdgpu is on

i cannot unload until i use

echo 0 > /sys/devices/virtual/vtconsole/vtcon1/bind

here is with dc=1:

rmmod hangs


[   45.665237] LoadPin: kernel-module pinning-ignored obj="/usr/lib/modules/4.15.0-1-rc/kernel/drivers/gpu/drm/ttm/ttm.ko" pid=1201 cmdline="modprobe amdgpu dc=1"
[   45.682434] LoadPin: kernel-module pinning-ignored obj="/usr/lib/modules/4.15.0-1-rc/kernel/drivers/gpu/drm/amd/lib/chash.ko" pid=1201 cmdline="modprobe amdgpu dc=1"
[   45.693068] LoadPin: kernel-module pinning-ignored obj="/usr/lib/modules/4.15.0-1-rc/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.202469] [drm] amdgpu kernel modesetting enabled.
[   46.202519] amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
[   46.202669] [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1462:0x809D 0xCF).
[   46.202681] [drm] register mmio base: 0xEFE00000
[   46.202682] [drm] register mmio size: 262144
[   46.202694] [drm] probing gen 2 caps for device 8086:1901 = 261ad03/e
[   46.202694] [drm] probing mlw for device 8086:1901 = 261ad03
[   46.202700] [drm] UVD is enabled in VM mode
[   46.202700] [drm] UVD ENC is enabled in VM mode
[   46.202702] [drm] VCE enabled in VM mode
[   46.202715] ATOM BIOS: 113-C99401-S01
[   46.202721] [drm] GPU post is not needed
[   46.202730] [drm] vm size is 64 GB, block size is 13-bit, fragment size is 9-bit
[   46.203725] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_mc.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.204234] amdgpu 0000:01:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[   46.204235] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[   46.204241] [drm] Detected VRAM RAM=2048M, BAR=256M
[   46.204242] [drm] RAM width 128bits GDDR5
[   46.204321] [TTM] Zone  kernel: Available graphics memory: 8082548 kiB
[   46.204321] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[   46.204321] [TTM] Initializing pool allocator
[   46.204323] [TTM] Initializing DMA pool allocator
[   46.204333] [drm] amdgpu: 2048M of VRAM memory ready
[   46.204333] [drm] amdgpu: 3072M of GTT memory ready.
[   46.204366] [drm] GART: num cpu pages 65536, num gpu pages 65536
[   46.204414] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
[   46.204447] amdgpu 0000:01:00.0: amdgpu: using MSI.
[   46.204462] [drm] amdgpu: irq initialized.
[   46.204475] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[   46.204487] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_pfp_2.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.205094] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_me_2.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.217397] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_ce_2.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.217550] [drm] Chained IB support enabled!
[   46.217556] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_rlc.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.217822] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_mec_2.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.218718] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_mec2_2.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.219694] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x0000000072e5becd
[   46.219847] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x000000007d1f8b28
[   46.219906] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x000000001b657fe6
[   46.219946] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x000000009cd8ddd2
[   46.219993] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x000000004495111b
[   46.220039] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x00000000705ffa8e
[   46.220113] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x000000009005afe9
[   46.220160] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x00000000e3bdcb19
[   46.220203] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x00000000d3b3c22e
[   46.220214] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x0000000002a1dbbf
[   46.220487] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_sdma.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.220721] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_sdma1.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.230175] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x0000000091d8de17
[   46.230241] amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x0000000030660891
[   46.230263] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_uvd.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.240501] [drm] Found UVD firmware Version: 1.79 Family ID: 16
[   46.240850] amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x00000000b794fc25
[   46.240867] amdgpu 0000:01:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x00000000fe2dd157
[   46.240880] amdgpu 0000:01:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x00000000979c0f78
[   46.240896] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_vce.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.241568] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[   46.241682] amdgpu 0000:01:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x00000000397058ac
[   46.241814] amdgpu 0000:01:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x00000000331d7ca7
[   46.241959] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_smc.bin" pid=1201 cmdline="modprobe amdgpu dc=1"
[   46.310973] amdgpu: [powerplay] 
[   46.310986] amdgpu: [powerplay] 
[   46.319356] [drm] DM_PPLIB: values for Engine clock
[   46.319356] [drm] DM_PPLIB:	 21400
[   46.319357] [drm] DM_PPLIB:	 48100
[   46.319357] [drm] DM_PPLIB:	 76600
[   46.319357] [drm] DM_PPLIB:	 102800
[   46.319357] [drm] DM_PPLIB:	 111100
[   46.319357] [drm] DM_PPLIB:	 114700
[   46.319358] [drm] DM_PPLIB:	 118100
[   46.319358] [drm] DM_PPLIB:	 121000
[   46.319358] [drm] DM_PPLIB: Warning: using default validation clocks!
[   46.319358] [drm] DM_PPLIB: Validation clocks:
[   46.319358] [drm] DM_PPLIB:    engine_max_clock: 72000
[   46.319359] [drm] DM_PPLIB:    memory_max_clock: 80000
[   46.319359] [drm] DM_PPLIB:    level           : 0
[   46.319359] [drm] DM_PPLIB: reducing engine clock level from 8 to 2
[   46.319360] [drm] DM_PPLIB: values for Memory clock
[   46.319360] [drm] DM_PPLIB:	 30000
[   46.319360] [drm] DM_PPLIB:	 175000
[   46.319360] [drm] DM_PPLIB: Warning: using default validation clocks!
[   46.319361] [drm] DM_PPLIB: Validation clocks:
[   46.319361] [drm] DM_PPLIB:    engine_max_clock: 72000
[   46.319361] [drm] DM_PPLIB:    memory_max_clock: 80000
[   46.319361] [drm] DM_PPLIB:    level           : 0
[   46.319362] [drm] DM_PPLIB: reducing memory clock level from 2 to 1
[   46.319362] [drm] DC: create_links: connectors_num: physical:3, virtual:0
[   46.335157] [drm] Display Core initialized!
[   46.341040] [drm] Rx Caps: 
[   46.465504] [drm] HBRx4 pass VS=0, PE=0
[   46.465794] [drm] LG ULTRAWIDE: [Block 0] 
[   46.465795] [drm] LG ULTRAWIDE: [Block 1] 
[   46.465796] [drm] dc_link_detect: manufacturer_id = 6D1E, product_id = 59F2, serial_number = 1010101, manufacture_week = 1, manufacture_year = 23, display_name = LG ULTRAWIDE, speaker_flag = 1, audio_mode_count = 1
[   46.465796] [drm] dc_link_detect: mode number = 0, format_code = 1, channel_count = 1, sample_rate = 7, sample_size = 7
[   46.466298] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   46.466299] [drm] Driver supports precise vblank timestamp query.
[   46.467687] [drm] ring test on 0 succeeded in 18 usecs
[   46.468436] [drm] ring test on 9 succeeded in 13 usecs
[   46.468458] [drm] ring test on 1 succeeded in 10 usecs
[   46.468511] [drm] ring test on 2 succeeded in 24 usecs
[   46.468549] [drm] ring test on 3 succeeded in 18 usecs
[   46.468587] [drm] ring test on 4 succeeded in 19 usecs
[   46.468625] [drm] ring test on 5 succeeded in 18 usecs
[   46.468663] [drm] ring test on 6 succeeded in 19 usecs
[   46.468702] [drm] ring test on 7 succeeded in 18 usecs
[   46.468740] [drm] ring test on 8 succeeded in 17 usecs
[   46.468798] [drm] ring test on 10 succeeded in 7 usecs
[   46.468805] [drm] ring test on 11 succeeded in 6 usecs
[   46.515372] [drm] ring test on 12 succeeded in 1 usecs
[   46.515420] [drm] ring test on 13 succeeded in 20 usecs
[   46.515432] [drm] ring test on 14 succeeded in 4 usecs
[   46.515432] [drm] UVD and UVD ENC initialized successfully.
[   46.626345] [drm] ring test on 15 succeeded in 7 usecs
[   46.626355] [drm] ring test on 16 succeeded in 3 usecs
[   46.626355] [drm] VCE initialized successfully.
[   46.626593] [drm] ib test on ring 0 succeeded
[   46.626823] [drm] ib test on ring 1 succeeded
[   46.626892] [drm] ib test on ring 2 succeeded
[   46.626947] [drm] ib test on ring 3 succeeded
[   46.627010] [drm] ib test on ring 4 succeeded
[   46.627097] [drm] ib test on ring 5 succeeded
[   46.627162] [drm] ib test on ring 6 succeeded
[   46.627259] [drm] ib test on ring 7 succeeded
[   46.627301] [drm] ib test on ring 8 succeeded
[   47.128114] [drm] ib test on ring 9 succeeded
[   47.128212] [drm] ib test on ring 10 succeeded
[   47.128300] [drm] ib test on ring 11 succeeded
[   47.130133] [drm] ib test on ring 12 succeeded
[   47.130553] [drm] ib test on ring 13 succeeded
[   47.130886] [drm] ib test on ring 14 succeeded
[   47.131156] [drm] ib test on ring 15 succeeded
[   47.133086] [drm] fb mappable at 0xD03F2000
[   47.133086] [drm] vram apper at 0xD0000000
[   47.133087] [drm] size 11059200
[   47.133087] [drm] fb depth is 24
[   47.133087] [drm]    pitch is 10240
[   47.133223] fbcon: amdgpudrmfb (fb1) is primary device
[   47.133223] fbcon: Remapping primary device, fb1, to tty 1-63
[   47.135020] [drm] {2560x1080, 2784x1111@185580Khz}
[   47.139059] [drm] RBRx4 pass VS=0, PE=0
[   47.168689] amdgpu 0000:01:00.0: fb1: amdgpudrmfb frame buffer device
[   47.168786] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:01:00.0 on minor 1

[   78.983798] Console: switching to colour dummy device 80x25

[   83.616305] [drm] amdgpu: finishing device.
[   83.616315] WARNING: CPU: 0 PID: 1380 at drivers/gpu/drm/drm_crtc.c:108 drm_crtc_force_disable+0x58/0x70
[   83.616315] Modules linked in: amdgpu(-) chash ttm cls_u32 sch_htb af_packet nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_netdev nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio libcrc32c bcache raid1 raid0 x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net tun vhost snd_hda_codec_realtek tap snd_hda_codec_generic kvm snd_hda_codec_hdmi md_mod snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core mei_me efi_pstore irqbypass plusb mei intel_cstate usbnet snd_pcm intel_uncore input_leds mii led_class
[   83.616341]  intel_rapl_perf efivars tpm_crb usbip_host usbip_core efivarfs algif_skcipher af_alg joydev mousedev psmouse atkbd crct10dif_pclmul libps2 crc32_pclmul ghash_clmulni_intel pcspkr tpm_tis tpm_tis_core shpchp fan tpm thermal i8042 acpi_pad vfat fat
[   83.616353] CPU: 0 PID: 1380 Comm: rmmod Not tainted 4.15.0-1-rc #2
[   83.616353] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[   83.616354] RIP: 0010:drm_crtc_force_disable+0x58/0x70
[   83.616355] RSP: 0018:ffffb3130559bd68 EFLAGS: 00010286
[   83.616355] RAX: ffffffffc0848b80 RBX: ffff9a179d654000 RCX: 0000000000000000
[   83.616356] RDX: ffffb3130559bd68 RSI: ffff9a179d654000 RDI: ffffb3130559bd98
[   83.616356] RBP: ffff9a17caf4d368 R08: ffff9a179d1d9048 R09: ffff9a179f620940
[   83.616356] R10: 000000000000000f R11: 0000000000000000 R12: ffff9a17caf4d000
[   83.616357] R13: ffffffffc08d1370 R14: ffff9a17cc787100 R15: ffffffffc08d13e8
[   83.616357] FS:  00007fc7e4463b80(0000) GS:ffff9a17dec00000(0000) knlGS:0000000000000000
[   83.616358] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   83.616358] CR2: 0000000001e19d48 CR3: 000000045d4a6002 CR4: 00000000003606f0
[   83.616358] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   83.616359] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   83.616359] Call Trace:
[   83.616362]  drm_crtc_force_disable_all+0x49/0x70
[   83.616376]  amdgpu_device_fini+0x1ad/0x1c0 [amdgpu]
[   83.616383]  amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[   83.616384]  drm_dev_unregister+0x37/0xe0
[   83.616389]  amdgpu_pci_remove+0x14/0x40 [amdgpu]
[   83.616391]  pci_device_remove+0x31/0xa0
[   83.616393]  device_release_driver_internal+0x152/0x210
[   83.616394]  driver_detach+0x32/0x70
[   83.616396]  bus_remove_driver+0x4c/0xc0
[   83.616397]  pci_unregister_driver+0x25/0xa0
[   83.616406]  amdgpu_exit+0x11/0x3b6 [amdgpu]
[   83.616408]  SyS_delete_module+0x19c/0x2a0
[   83.616410]  do_syscall_64+0x48/0xe0
[   83.616412]  entry_SYSCALL64_slow_path+0x25/0x25
[   83.616413] RIP: 0033:0x7fc7e3b70b87
[   83.616413] RSP: 002b:00007ffd6adc1d38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[   83.616414] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc7e3b70b87
[   83.616414] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000001e0f8a8
[   83.616415] RBP: 0000000001e0f840 R08: 00007ffd6adc0cb1 R09: 0000000000000000
[   83.616415] R10: 00000000000008b2 R11: 0000000000000202 R12: 00007ffd6adc3750
[   83.616415] R13: 0000000000000000 R14: 0000000001e0f840 R15: 0000000001e0f260
[   83.616416] Code: 48 8b 80 98 03 00 00 48 83 78 20 00 75 1d 48 89 d7 e8 8d ff ff ff 48 8b 4c 24 30 65 48 33 0c 25 28 00 00 00 75 09 48 83 c4 38 c3 <0f> ff eb df e8 bf 96 b7 ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 
[   83.616429] ---[ end trace b52f33037e0770c4 ]---
[   83.616436] WARNING: CPU: 0 PID: 1380 at drivers/gpu/drm/drm_crtc.c:499 drm_mode_set_config_internal+0x1c/0x30
[   83.616437] Modules linked in: amdgpu(-) chash ttm cls_u32 sch_htb af_packet nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_netdev nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio libcrc32c bcache raid1 raid0 x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net tun vhost snd_hda_codec_realtek tap snd_hda_codec_generic kvm snd_hda_codec_hdmi md_mod snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core mei_me efi_pstore irqbypass plusb mei intel_cstate usbnet snd_pcm intel_uncore input_leds mii led_class
[   83.616455]  intel_rapl_perf efivars tpm_crb usbip_host usbip_core efivarfs algif_skcipher af_alg joydev mousedev psmouse atkbd crct10dif_pclmul libps2 crc32_pclmul ghash_clmulni_intel pcspkr tpm_tis tpm_tis_core shpchp fan tpm thermal i8042 acpi_pad vfat fat
[   83.616462] CPU: 0 PID: 1380 Comm: rmmod Tainted: G        W        4.15.0-1-rc #2
[   83.616463] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[   83.616464] RIP: 0010:drm_mode_set_config_internal+0x1c/0x30
[   83.616464] RSP: 0018:ffffb3130559bd60 EFLAGS: 00010286
[   83.616465] RAX: ffffffffc0848b80 RBX: ffff9a179d654000 RCX: 0000000000000000
[   83.616465] RDX: ffffb3130559bd68 RSI: ffff9a179d654000 RDI: ffffb3130559bd68
[   83.616465] RBP: ffff9a17caf4d368 R08: ffff9a179d1d9048 R09: ffff9a179f620940
[   83.616465] R10: 000000000000000f R11: 0000000000000000 R12: ffff9a17caf4d000
[   83.616466] R13: ffffffffc08d1370 R14: ffff9a17cc787100 R15: ffffffffc08d13e8
[   83.616466] FS:  00007fc7e4463b80(0000) GS:ffff9a17dec00000(0000) knlGS:0000000000000000
[   83.616467] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   83.616467] CR2: 0000000001e19d48 CR3: 000000045d4a6002 CR4: 00000000003606f0
[   83.616467] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   83.616468] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   83.616468] Call Trace:
[   83.616469]  drm_crtc_force_disable+0x43/0x70
[   83.616470]  drm_crtc_force_disable_all+0x49/0x70
[   83.616479]  amdgpu_device_fini+0x1ad/0x1c0 [amdgpu]
[   83.616486]  amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[   83.616487]  drm_dev_unregister+0x37/0xe0
[   83.616493]  amdgpu_pci_remove+0x14/0x40 [amdgpu]
[   83.616494]  pci_device_remove+0x31/0xa0
[   83.616495]  device_release_driver_internal+0x152/0x210
[   83.616496]  driver_detach+0x32/0x70
[   83.616497]  bus_remove_driver+0x4c/0xc0
[   83.616498]  pci_unregister_driver+0x25/0xa0
[   83.616508]  amdgpu_exit+0x11/0x3b6 [amdgpu]
[   83.616509]  SyS_delete_module+0x19c/0x2a0
[   83.616510]  do_syscall_64+0x48/0xe0
[   83.616511]  entry_SYSCALL64_slow_path+0x25/0x25
[   83.616512] RIP: 0033:0x7fc7e3b70b87
[   83.616512] RSP: 002b:00007ffd6adc1d38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[   83.616513] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc7e3b70b87
[   83.616513] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000001e0f8a8
[   83.616513] RBP: 0000000001e0f840 R08: 00007ffd6adc0cb1 R09: 0000000000000000
[   83.616514] R10: 00000000000008b2 R11: 0000000000000202 R12: 00007ffd6adc3750
[   83.616514] R13: 0000000000000000 R14: 0000000001e0f840 R15: 0000000001e0f260
[   83.616515] Code: 8e b8 ea ff ff ff c3 b8 fe ff ff ff eb 8b 90 48 8b 47 08 48 8b 00 48 8b 80 98 03 00 00 48 83 78 20 00 75 07 31 f6 e9 a4 fb ff ff <0f> ff 31 f6 e9 9b fb ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 48 
[   83.616527] ---[ end trace b52f33037e0770c5 ]---
[   83.616540] WARNING: CPU: 0 PID: 1380 at drivers/gpu/drm/drm_atomic.c:272 drm_atomic_get_crtc_state+0xdb/0x100
[   83.616540] Modules linked in: amdgpu(-) chash ttm cls_u32 sch_htb af_packet nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_netdev nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio libcrc32c bcache raid1 raid0 x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net tun vhost snd_hda_codec_realtek tap snd_hda_codec_generic kvm snd_hda_codec_hdmi md_mod snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core mei_me efi_pstore irqbypass plusb mei intel_cstate usbnet snd_pcm intel_uncore input_leds mii led_class
[   83.616552]  intel_rapl_perf efivars tpm_crb usbip_host usbip_core efivarfs algif_skcipher af_alg joydev mousedev psmouse atkbd crct10dif_pclmul libps2 crc32_pclmul ghash_clmulni_intel pcspkr tpm_tis tpm_tis_core shpchp fan tpm thermal i8042 acpi_pad vfat fat
[   83.616557] CPU: 0 PID: 1380 Comm: rmmod Tainted: G        W        4.15.0-1-rc #2
[   83.616557] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[   83.616558] RIP: 0010:drm_atomic_get_crtc_state+0xdb/0x100
[   83.616559] RSP: 0018:ffffb3130559bcb0 EFLAGS: 00010246
[   83.616559] RAX: 0000000000000000 RBX: ffffb3130559bd68 RCX: 0000000000000000
[   83.616559] RDX: ffffffff85c3043d RSI: ffff9a179d654000 RDI: ffff9a179d452a80
[   83.616560] RBP: ffffb3130559bd68 R08: ffff9a179f5da000 R09: ffff9a179f5da000
[   83.616560] R10: 000000000000000f R11: 0000000000000000 R12: 0000000000000000
[   83.616560] R13: ffffffffc08d1370 R14: ffff9a179d654000 R15: ffffffffc08d13e8
[   83.616561] FS:  00007fc7e4463b80(0000) GS:ffff9a17dec00000(0000) knlGS:0000000000000000
[   83.616561] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   83.616561] CR2: 0000000001e19d48 CR3: 000000045d4a6002 CR4: 00000000003606f0
[   83.616562] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   83.616562] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   83.616562] Call Trace:
[   83.616564]  ? retint_kernel+0x1b/0x1d
[   83.616565]  __drm_atomic_helper_set_config+0x33/0x2f0
[   83.616567]  drm_atomic_helper_set_config+0x31/0x90
[   83.616568]  __drm_mode_set_config_internal+0x5c/0x110
[   83.616569]  drm_crtc_force_disable+0x43/0x70
[   83.616570]  drm_crtc_force_disable_all+0x49/0x70
[   83.616577]  amdgpu_device_fini+0x1ad/0x1c0 [amdgpu]
[   83.616583]  amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[   83.616584]  drm_dev_unregister+0x37/0xe0
[   83.616589]  amdgpu_pci_remove+0x14/0x40 [amdgpu]
[   83.616590]  pci_device_remove+0x31/0xa0
[   83.616592]  device_release_driver_internal+0x152/0x210
[   83.616593]  driver_detach+0x32/0x70
[   83.616594]  bus_remove_driver+0x4c/0xc0
[   83.616594]  pci_unregister_driver+0x25/0xa0
[   83.616604]  amdgpu_exit+0x11/0x3b6 [amdgpu]
[   83.616605]  SyS_delete_module+0x19c/0x2a0
[   83.616606]  do_syscall_64+0x48/0xe0
[   83.616607]  entry_SYSCALL64_slow_path+0x25/0x25
[   83.616607] RIP: 0033:0x7fc7e3b70b87
[   83.616607] RSP: 002b:00007ffd6adc1d38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[   83.616608] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc7e3b70b87
[   83.616608] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000001e0f8a8
[   83.616608] RBP: 0000000001e0f840 R08: 00007ffd6adc0cb1 R09: 0000000000000000
[   83.616609] R10: 00000000000008b2 R11: 0000000000000202 R12: 00007ffd6adc3750
[   83.616609] R13: 0000000000000000 R14: 0000000001e0f840 R15: 0000000001e0f260
[   83.616610] Code: 10 02 00 00 48 c7 c2 c0 06 c3 85 8b 4d 60 53 4c 8b 45 20 48 89 44 24 08 e8 f3 23 ff ff 58 48 8b 04 24 48 83 c4 08 5b 5d 41 5c c3 <0f> ff e9 3a ff ff ff 48 c7 c0 f4 ff ff ff e9 46 ff ff ff 48 98 
[   83.616622] ---[ end trace b52f33037e0770c6 ]---

hang:

[  185.311746] INFO: task rmmod:1380 blocked for more than 60 seconds.
[  185.311749]       Tainted: G        W        4.15.0-1-rc #2
[  185.311750] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  185.311750] rmmod           D    0  1380   1169 0x80000080
[  185.311752] Call Trace:
[  185.311755]  ? __schedule+0x1ac/0x6d0
[  185.311756]  schedule+0x2a/0x80
[  185.311757]  schedule_preempt_disabled+0xc/0x20
[  185.311758]  __ww_mutex_lock.isra.1+0x54a/0x6c0
[  185.311759]  ? preempt_schedule_irq+0x27/0x50
[  185.311761]  ? drm_modeset_lock+0xd1/0xf0
[  185.311762]  drm_modeset_lock+0xd1/0xf0
[  185.311763]  drm_atomic_get_crtc_state+0x4f/0x100
[  185.311766]  ? retint_kernel+0x1b/0x1d
[  185.311768]  __drm_atomic_helper_set_config+0x33/0x2f0
[  185.311769]  drm_atomic_helper_set_config+0x31/0x90
[  185.311771]  __drm_mode_set_config_internal+0x5c/0x110
[  185.311773]  drm_crtc_force_disable+0x43/0x70
[  185.311774]  drm_crtc_force_disable_all+0x49/0x70
[  185.311786]  amdgpu_device_fini+0x1ad/0x1c0 [amdgpu]
[  185.311792]  amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[  185.311793]  drm_dev_unregister+0x37/0xe0
[  185.311799]  amdgpu_pci_remove+0x14/0x40 [amdgpu]
[  185.311800]  pci_device_remove+0x31/0xa0
[  185.311803]  device_release_driver_internal+0x152/0x210
[  185.311804]  driver_detach+0x32/0x70
[  185.311805]  bus_remove_driver+0x4c/0xc0
[  185.311806]  pci_unregister_driver+0x25/0xa0
[  185.311815]  amdgpu_exit+0x11/0x3b6 [amdgpu]
[  185.311820]  SyS_delete_module+0x19c/0x2a0
[  185.311823]  do_syscall_64+0x48/0xe0
[  185.311824]  entry_SYSCALL64_slow_path+0x25/0x25
[  185.311825] RIP: 0033:0x7fc7e3b70b87
[  185.311825] RSP: 002b:00007ffd6adc1d38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[  185.311826] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc7e3b70b87
[  185.311826] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000001e0f8a8
[  185.311827] RBP: 0000000001e0f840 R08: 00007ffd6adc0cb1 R09: 0000000000000000
[  185.311827] R10: 00000000000008b2 R11: 0000000000000202 R12: 00007ffd6adc3750
[  185.311828] R13: 0000000000000000 R14: 0000000001e0f840 R15: 0000000001e0f260
[  246.751527] INFO: task rmmod:1380 blocked for more than 60 seconds.
[  246.751530]       Tainted: G        W        4.15.0-1-rc #2
[  246.751530] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  246.751531] rmmod           D    0  1380   1169 0x80000080
[  246.751532] Call Trace:
[  246.751536]  ? __schedule+0x1ac/0x6d0
[  246.751537]  schedule+0x2a/0x80
[  246.751538]  schedule_preempt_disabled+0xc/0x20
[  246.751539]  __ww_mutex_lock.isra.1+0x54a/0x6c0
[  246.751540]  ? preempt_schedule_irq+0x27/0x50
[  246.751542]  ? drm_modeset_lock+0xd1/0xf0
[  246.751543]  drm_modeset_lock+0xd1/0xf0
[  246.751544]  drm_atomic_get_crtc_state+0x4f/0x100
[  246.751545]  ? retint_kernel+0x1b/0x1d
[  246.751548]  __drm_atomic_helper_set_config+0x33/0x2f0
[  246.751549]  drm_atomic_helper_set_config+0x31/0x90
[  246.751552]  __drm_mode_set_config_internal+0x5c/0x110
[  246.751554]  drm_crtc_force_disable+0x43/0x70
[  246.751555]  drm_crtc_force_disable_all+0x49/0x70
[  246.751579]  amdgpu_device_fini+0x1ad/0x1c0 [amdgpu]
[  246.751585]  amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[  246.751586]  drm_dev_unregister+0x37/0xe0
[  246.751592]  amdgpu_pci_remove+0x14/0x40 [amdgpu]
[  246.751593]  pci_device_remove+0x31/0xa0
[  246.751595]  device_release_driver_internal+0x152/0x210
[  246.751597]  driver_detach+0x32/0x70
[  246.751598]  bus_remove_driver+0x4c/0xc0
[  246.751598]  pci_unregister_driver+0x25/0xa0
[  246.751608]  amdgpu_exit+0x11/0x3b6 [amdgpu]
[  246.751610]  SyS_delete_module+0x19c/0x2a0
[  246.751611]  do_syscall_64+0x48/0xe0
[  246.751612]  entry_SYSCALL64_slow_path+0x25/0x25
[  246.751613] RIP: 0033:0x7fc7e3b70b87
[  246.751614] RSP: 002b:00007ffd6adc1d38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[  246.751614] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc7e3b70b87
[  246.751615] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000001e0f8a8
[  246.751615] RBP: 0000000001e0f840 R08: 00007ffd6adc0cb1 R09: 0000000000000000
[  246.751615] R10: 00000000000008b2 R11: 0000000000000202 R12: 00007ffd6adc3750
[  246.751616] R13: 0000000000000000 R14: 0000000001e0f840 R15: 0000000001e0f260
[  308.191107] INFO: task rmmod:1380 blocked for more than 60 seconds.
[  308.191109]       Tainted: G        W        4.15.0-1-rc #2
[  308.191110] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  308.191111] rmmod           D    0  1380   1169 0x80000080
[  308.191112] Call Trace:
[  308.191115]  ? __schedule+0x1ac/0x6d0
[  308.191116]  schedule+0x2a/0x80
[  308.191117]  schedule_preempt_disabled+0xc/0x20
[  308.191118]  __ww_mutex_lock.isra.1+0x54a/0x6c0
[  308.191120]  ? preempt_schedule_irq+0x27/0x50
[  308.191122]  ? drm_modeset_lock+0xd1/0xf0
[  308.191122]  drm_modeset_lock+0xd1/0xf0
[  308.191124]  drm_atomic_get_crtc_state+0x4f/0x100
[  308.191125]  ? retint_kernel+0x1b/0x1d
[  308.191126]  __drm_atomic_helper_set_config+0x33/0x2f0
[  308.191127]  drm_atomic_helper_set_config+0x31/0x90
[  308.191129]  __drm_mode_set_config_internal+0x5c/0x110
[  308.191130]  drm_crtc_force_disable+0x43/0x70
[  308.191131]  drm_crtc_force_disable_all+0x49/0x70
[  308.191142]  amdgpu_device_fini+0x1ad/0x1c0 [amdgpu]
[  308.191148]  amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[  308.191149]  drm_dev_unregister+0x37/0xe0
[  308.191154]  amdgpu_pci_remove+0x14/0x40 [amdgpu]
[  308.191157]  pci_device_remove+0x31/0xa0
[  308.191159]  device_release_driver_internal+0x152/0x210
[  308.191162]  driver_detach+0x32/0x70
[  308.191163]  bus_remove_driver+0x4c/0xc0
[  308.191164]  pci_unregister_driver+0x25/0xa0
[  308.191173]  amdgpu_exit+0x11/0x3b6 [amdgpu]
[  308.191175]  SyS_delete_module+0x19c/0x2a0
[  308.191176]  do_syscall_64+0x48/0xe0
[  308.191177]  entry_SYSCALL64_slow_path+0x25/0x25
[  308.191179] RIP: 0033:0x7fc7e3b70b87
[  308.191179] RSP: 002b:00007ffd6adc1d38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[  308.191180] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc7e3b70b87
[  308.191180] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000001e0f8a8
[  308.191180] RBP: 0000000001e0f840 R08: 00007ffd6adc0cb1 R09: 0000000000000000
[  308.191181] R10: 00000000000008b2 R11: 0000000000000202 R12: 00007ffd6adc3750
[  308.191181] R13: 0000000000000000 R14: 0000000001e0f840 R15: 0000000001e0f260


----------

with dc=0 amdgpu can be unloaded


[   55.321963] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:01:00.0 on minor 1
[   77.829378] Console: switching to colour dummy device 80x25

[   81.365952] [drm] amdgpu: finishing device.
[   81.397013] amdgpu: [powerplay] 
[   81.397027] amdgpu: [powerplay] 
[   81.525471] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[   81.525482] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[   81.525497] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[   81.525505] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[   81.525520] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[   81.525528] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[   81.525542] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[   81.525550] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[   81.525564] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[   81.525571] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[   81.525585] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[   81.525592] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[   81.525665] ------------[ cut here ]------------
[   81.525665] Memory manager not clean during takedown.
[   81.525674] WARNING: CPU: 1 PID: 1407 at drivers/gpu/drm/drm_mm.c:895 drm_mm_takedown+0x1b/0x20
[   81.525674] Modules linked in: amdgpu(-) chash ttm cls_u32 sch_htb af_packet nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_netdev nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio libcrc32c bcache raid0 x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net raid1 tun vhost tap kvm snd_hda_codec_realtek md_mod snd_hda_codec_generic snd_hda_codec_hdmi irqbypass snd_hda_intel intel_cstate snd_hda_codec intel_uncore intel_rapl_perf snd_hwdep efi_pstore snd_hda_core mei_me plusb usbnet snd_pcm input_leds
[   81.525693]  mei mii led_class efivars tpm_crb usbip_host usbip_core efivarfs algif_skcipher af_alg mousedev joydev psmouse crct10dif_pclmul atkbd crc32_pclmul libps2 pcspkr ghash_clmulni_intel shpchp thermal tpm_tis tpm_tis_core fan tpm i8042 acpi_pad vfat fat
[   81.525702] CPU: 1 PID: 1407 Comm: rmmod Not tainted 4.15.0-1-rc #2
[   81.525702] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[   81.525704] RIP: 0010:drm_mm_takedown+0x1b/0x20
[   81.525704] RSP: 0018:ffffbfce0297bd00 EFLAGS: 00010286
[   81.525705] RAX: 0000000000000000 RBX: ffffa261745a8a00 RCX: 0000000000000006
[   81.525705] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffffa2619ec8cab0
[   81.525705] RBP: ffffa2615c712878 R08: 0000000000000001 R09: 0000000000000582
[   81.525706] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa261745a8ae0
[   81.525706] R13: 0000000000000000 R14: 0000000000000170 R15: ffffffffc061b3e8
[   81.525706] FS:  00007ff958e5fb80(0000) GS:ffffa2619ec80000(0000) knlGS:0000000000000000
[   81.525707] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   81.525707] CR2: 00007f2352182498 CR3: 000000045a47e001 CR4: 00000000003606e0
[   81.525707] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   81.525708] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   81.525708] Call Trace:
[   81.525728]  amdgpu_vram_mgr_fini+0x22/0x50 [amdgpu]
[   81.525736]  ttm_bo_clean_mm+0x9a/0xe0 [ttm]
[   81.525743]  amdgpu_ttm_fini+0xe1/0x1e0 [amdgpu]
[   81.525750]  amdgpu_bo_fini+0x9/0x30 [amdgpu]
[   81.525758]  gmc_v8_0_sw_fini+0x29/0x50 [amdgpu]
[   81.525764]  amdgpu_fini+0x1f3/0x2f0 [amdgpu]
[   81.525769]  amdgpu_device_fini+0x5f/0x1c0 [amdgpu]
[   81.525775]  amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[   81.525777]  drm_dev_unregister+0x37/0xe0
[   81.525782]  amdgpu_pci_remove+0x14/0x40 [amdgpu]
[   81.525784]  pci_device_remove+0x31/0xa0
[   81.525786]  device_release_driver_internal+0x152/0x210
[   81.525787]  driver_detach+0x32/0x70
[   81.525788]  bus_remove_driver+0x4c/0xc0
[   81.525789]  pci_unregister_driver+0x25/0xa0
[   81.525798]  amdgpu_exit+0x11/0x3b6 [amdgpu]
[   81.525800]  SyS_delete_module+0x19c/0x2a0
[   81.525802]  do_syscall_64+0x48/0xe0
[   81.525804]  entry_SYSCALL64_slow_path+0x25/0x25
[   81.525804] RIP: 0033:0x7ff95856cb87
[   81.525805] RSP: 002b:00007ffc64d76928 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[   81.525805] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff95856cb87
[   81.525806] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000001ce88a8
[   81.525806] RBP: 0000000001ce8840 R08: 00007ffc64d758a1 R09: 0000000000000000
[   81.525806] R10: 00000000000008b2 R11: 0000000000000206 R12: 00007ffc64d78750
[   81.525807] R13: 0000000000000000 R14: 0000000001ce8840 R15: 0000000001ce8260
[   81.525807] Code: fe ff ff 48 89 31 e9 f7 fe ff ff 0f 1f 44 00 00 48 8b 47 38 48 83 c7 38 48 39 c7 75 02 f3 c3 48 c7 c7 70 f6 c2 8b e8 35 a4 b7 ff <0f> ff c3 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 
[   81.525820] ---[ end trace 6ae74b00c25ed621 ]---
[   81.525823] [TTM] Finalizing pool allocator
[   81.525824] [TTM] Finalizing DMA pool allocator
[   81.525827] BUG: scheduling while atomic: rmmod/1407/0x00000000
[   81.525827] Modules linked in: amdgpu(-) chash ttm cls_u32 sch_htb af_packet nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_netdev nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio libcrc32c bcache raid0 x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net raid1 tun vhost tap kvm snd_hda_codec_realtek md_mod snd_hda_codec_generic snd_hda_codec_hdmi irqbypass snd_hda_intel intel_cstate snd_hda_codec intel_uncore intel_rapl_perf snd_hwdep efi_pstore snd_hda_core mei_me plusb usbnet snd_pcm input_leds
[   81.525840]  mei mii led_class efivars tpm_crb usbip_host usbip_core efivarfs algif_skcipher af_alg mousedev joydev psmouse crct10dif_pclmul atkbd crc32_pclmul libps2 pcspkr ghash_clmulni_intel shpchp thermal tpm_tis tpm_tis_core fan tpm i8042 acpi_pad vfat fat
[   81.525846] CPU: 1 PID: 1407 Comm: rmmod Tainted: G        W        4.15.0-1-rc #2
[   81.525846] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[   81.525846] Call Trace:
[   81.525848]  dump_stack+0x46/0x65
[   81.525850]  __schedule_bug+0x47/0x60
[   81.525851]  __schedule+0x531/0x6d0
[   81.525852]  ? enqueue_entity+0x149/0x820
[   81.525853]  schedule+0x2a/0x80
[   81.525854]  schedule_timeout+0x1ef/0x2b0
[   81.525855]  wait_for_common+0x11c/0x1d0
[   81.525856]  ? wake_up_q+0x70/0x70
[   81.525857]  kthread_stop+0x38/0x60
[   81.525858]  destroy_workqueue+0x114/0x180
[   81.525860]  ttm_mem_global_release+0x21/0x80 [ttm]
[   81.525861]  drm_global_item_unref+0x3f/0x60
[   81.525867]  amdgpu_ttm_fini+0x179/0x1e0 [amdgpu]
[   81.525874]  amdgpu_bo_fini+0x9/0x30 [amdgpu]
[   81.525882]  gmc_v8_0_sw_fini+0x29/0x50 [amdgpu]
[   81.525887]  amdgpu_fini+0x1f3/0x2f0 [amdgpu]
[   81.525893]  amdgpu_device_fini+0x5f/0x1c0 [amdgpu]
[   81.525899]  amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[   81.525900]  drm_dev_unregister+0x37/0xe0
[   81.525905]  amdgpu_pci_remove+0x14/0x40 [amdgpu]
[   81.525906]  pci_device_remove+0x31/0xa0
[   81.525908]  device_release_driver_internal+0x152/0x210
[   81.525909]  driver_detach+0x32/0x70
[   81.525910]  bus_remove_driver+0x4c/0xc0
[   81.525911]  pci_unregister_driver+0x25/0xa0
[   81.525920]  amdgpu_exit+0x11/0x3b6 [amdgpu]
[   81.525921]  SyS_delete_module+0x19c/0x2a0
[   81.525922]  do_syscall_64+0x48/0xe0
[   81.525923]  entry_SYSCALL64_slow_path+0x25/0x25
[   81.525923] RIP: 0033:0x7ff95856cb87
[   81.525923] RSP: 002b:00007ffc64d76928 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[   81.525924] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff95856cb87
[   81.525924] RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000000001ce88a8
[   81.525925] RBP: 0000000001ce8840 R08: 00007ffc64d758a1 R09: 0000000000000000
[   81.525925] R10: 00000000000008b2 R11: 0000000000000206 R12: 00007ffc64d78750
[   81.525925] R13: 0000000000000000 R14: 0000000001ce8840 R15: 0000000001ce8260
[   81.525931] [TTM] Zone  kernel: Used memory at exit: 12 kiB
[   81.525931] [TTM] Zone   dma32: Used memory at exit: 12 kiB
[   81.525932] [drm] amdgpu: ttm finalized



but it does not load again properly:




[  189.453963] LoadPin: kernel-module pinning-ignored obj="/usr/lib/modules/4.15.0-1-rc/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.777294] [drm] amdgpu kernel modesetting enabled.
[  189.777564] [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67EF 0x1462:0x809D 0xCF).
[  189.777579] [drm] register mmio base: 0xEFE00000
[  189.777579] [drm] register mmio size: 262144
[  189.777591] [drm] probing gen 2 caps for device 8086:1901 = 261ad03/e
[  189.777592] [drm] probing mlw for device 8086:1901 = 261ad03
[  189.777598] [drm] UVD is enabled in VM mode
[  189.777598] [drm] UVD ENC is enabled in VM mode
[  189.777600] [drm] VCE enabled in VM mode
[  189.777614] ATOM BIOS: 113-C99401-S01
[  189.777620] [drm] GPU post is not needed
[  189.777684] [drm] vm size is 64 GB, block size is 13-bit, fragment size is 9-bit
[  189.777696] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_mc.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.777716] amdgpu 0000:01:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[  189.777717] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[  189.777719] [drm] Detected VRAM RAM=2048M, BAR=256M
[  189.777720] [drm] RAM width 128bits GDDR5
[  189.777774] [TTM] Zone  kernel: Available graphics memory: 8082548 kiB
[  189.777775] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[  189.777775] [TTM] Initializing pool allocator
[  189.777777] [TTM] Initializing DMA pool allocator
[  189.777786] [drm] amdgpu: 2048M of VRAM memory ready
[  189.777786] [drm] amdgpu: 3072M of GTT memory ready.
[  189.777793] [drm] GART: num cpu pages 65536, num gpu pages 65536
[  189.777846] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
[  189.777889] amdgpu 0000:01:00.0: amdgpu: using MSI.
[  189.777890] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[  189.777890] [drm] Driver supports precise vblank timestamp query.
[  189.777907] [drm] amdgpu: irq initialized.
[  189.777921] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[  189.777990] [drm] AMDGPU Display Connectors
[  189.777990] [drm] Connector 0:
[  189.777990] [drm]   DP-2
[  189.777990] [drm]   HPD5
[  189.777991] [drm]   DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b
[  189.777991] [drm]   Encoders:
[  189.777991] [drm]     DFP1: INTERNAL_UNIPHY1
[  189.777992] [drm] Connector 1:
[  189.777992] [drm]   HDMI-A-4
[  189.777992] [drm]   HPD3
[  189.777993] [drm]   DDC: 0x4874 0x4874 0x4875 0x4875 0x4876 0x4876 0x4877 0x4877
[  189.777993] [drm]   Encoders:
[  189.777993] [drm]     DFP2: INTERNAL_UNIPHY1
[  189.777993] [drm] Connector 2:
[  189.777993] [drm]   DVI-D-1
[  189.777994] [drm]   HPD4
[  189.777994] [drm]   DDC: 0x4878 0x4878 0x4879 0x4879 0x487a 0x487a 0x487b 0x487b
[  189.777994] [drm]   Encoders:
[  189.777995] [drm]     DFP3: INTERNAL_UNIPHY
[  189.778005] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_pfp_2.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.778019] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_me_2.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.778026] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_ce_2.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.778146] [drm] Chained IB support enabled!
[  189.778152] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_rlc.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.778176] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_mec_2.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.778326] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_mec2_2.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.778966] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x00000000456538e8
[  189.779035] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x0000000087651bf6
[  189.779057] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x00000000680696bd
[  189.779077] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x00000000c26caa62
[  189.779092] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x0000000053481efe
[  189.779109] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x00000000582def0e
[  189.779125] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x0000000093e310d1
[  189.779140] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x00000000e9eccf7c
[  189.779153] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x00000000b4fa868a
[  189.779166] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x00000000439981f7
[  189.779421] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_sdma.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.779435] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_sdma1.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.779721] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x00000000c76a4e3d
[  189.779786] amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x00000000a47d905e
[  189.779809] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_uvd.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.779867] [drm] Found UVD firmware Version: 1.79 Family ID: 16
[  189.780944] amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x000000004fdb73f3
[  189.782125] amdgpu 0000:01:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x0000000077cfc94d
[  189.782161] amdgpu 0000:01:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x000000002b4b50a7
[  189.782172] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/amdgpu/polaris11_vce.bin" pid=1592 cmdline="modprobe amdgpu dc=0 disp_priority=1"
[  189.782215] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[  189.782543] amdgpu 0000:01:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x00000000110798ab
[  189.782697] amdgpu 0000:01:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x000000005d049f27
[  189.789375] amdgpu: [powerplay] 
[  189.789392] amdgpu: [powerplay] 
[  189.794892] [drm] ring test on 0 succeeded in 18 usecs
[  189.986988] [drm:gfx_v8_0_kiq_resume [amdgpu]] *ERROR* KCQ enable failed (scratch(0xC040)=0xCAFEDEAD)
[  189.987006] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -22
[  189.987007] amdgpu 0000:01:00.0: amdgpu_init failed
[  189.987438] amdgpu: [powerplay] 
[  189.987453] amdgpu: [powerplay] 
[  190.039125] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[  190.039145] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[  190.039170] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[  190.039187] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[  190.039210] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[  190.039227] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[  190.039250] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[  190.039266] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[  190.039289] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[  190.039305] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[  190.039327] [drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[  190.039343] [drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
[  190.039426] ------------[ cut here ]------------
[  190.039427] Memory manager not clean during takedown.
[  190.039435] WARNING: CPU: 1 PID: 1592 at drivers/gpu/drm/drm_mm.c:895 drm_mm_takedown+0x1b/0x20
[  190.039436] Modules linked in: amdgpu(+) chash ttm cls_u32 sch_htb af_packet nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_netdev nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio libcrc32c bcache raid0 x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net raid1 tun vhost tap kvm snd_hda_codec_realtek md_mod snd_hda_codec_generic snd_hda_codec_hdmi irqbypass snd_hda_intel intel_cstate snd_hda_codec intel_uncore intel_rapl_perf snd_hwdep efi_pstore snd_hda_core mei_me plusb usbnet snd_pcm input_leds
[  190.039455]  mei mii led_class efivars tpm_crb usbip_host usbip_core efivarfs algif_skcipher af_alg mousedev joydev psmouse crct10dif_pclmul atkbd crc32_pclmul libps2 pcspkr ghash_clmulni_intel shpchp thermal tpm_tis tpm_tis_core fan tpm i8042 acpi_pad vfat fat [last unloaded: amdgpu]
[  190.039464] CPU: 1 PID: 1592 Comm: modprobe Tainted: G        W        4.15.0-1-rc #2
[  190.039464] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[  190.039465] RIP: 0010:drm_mm_takedown+0x1b/0x20
[  190.039466] RSP: 0018:ffffbfce024cf9f0 EFLAGS: 00010282
[  190.039466] RAX: 0000000000000000 RBX: ffffa2615a4bd400 RCX: 0000000000000006
[  190.039467] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffffa2619ec8cab0
[  190.039467] RBP: ffffa26148082878 R08: 0000000000000001 R09: 0000000000000645
[  190.039467] R10: ffffbfce024cf998 R11: 0000000000000000 R12: ffffa2615a4bd4e0
[  190.039468] R13: 0000000000000000 R14: 0000000000000170 R15: ffffa26148082f28
[  190.039468] FS:  00007f164a557b80(0000) GS:ffffa2619ec80000(0000) knlGS:0000000000000000
[  190.039469] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  190.039469] CR2: 000055c3a7a98510 CR3: 000000045adfc002 CR4: 00000000003606e0
[  190.039469] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  190.039470] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  190.039470] Call Trace:
[  190.039489]  amdgpu_vram_mgr_fini+0x22/0x50 [amdgpu]
[  190.039492]  ttm_bo_clean_mm+0x9a/0xe0 [ttm]
[  190.039508]  amdgpu_ttm_fini+0xe1/0x1e0 [amdgpu]
[  190.039524]  amdgpu_bo_fini+0x9/0x30 [amdgpu]
[  190.039540]  gmc_v8_0_sw_fini+0x29/0x50 [amdgpu]
[  190.039555]  amdgpu_fini+0x1f3/0x2f0 [amdgpu]
[  190.039569]  amdgpu_device_init+0xda2/0x14d0 [amdgpu]
[  190.039571]  ? kernfs_activate+0x5e/0x80
[  190.039572]  ? kernfs_add_one+0xdf/0x130
[  190.039586]  amdgpu_driver_load_kms+0x43/0x1a0 [amdgpu]
[  190.039588]  drm_dev_register+0x12a/0x1b0
[  190.039601]  amdgpu_pci_probe+0x104/0x140 [amdgpu]
[  190.039603]  pci_device_probe+0xc3/0x140
[  190.039605]  driver_probe_device+0x307/0x470
[  190.039607]  __driver_attach+0x98/0xe0
[  190.039608]  ? driver_probe_device+0x470/0x470
[  190.039609]  bus_for_each_dev+0x64/0xb0
[  190.039610]  bus_add_driver+0x1b8/0x260
[  190.039611]  ? 0xffffffffc084d000
[  190.039611]  driver_register+0x52/0xc0
[  190.039612]  ? 0xffffffffc084d000
[  190.039613]  do_one_initcall+0x46/0x180
[  190.039614]  do_init_module+0x51/0x1c9
[  190.039616]  load_module+0x23c4/0x2a30
[  190.039618]  ? SyS_finit_module+0xea/0x110
[  190.039619]  SyS_finit_module+0xea/0x110
[  190.039620]  do_syscall_64+0x48/0xe0
[  190.039622]  entry_SYSCALL64_slow_path+0x25/0x25
[  190.039622] RIP: 0033:0x7f1649c5ee29
[  190.039623] RSP: 002b:00007ffc877e3828 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[  190.039623] RAX: ffffffffffffffda RBX: 000000000192bbc0 RCX: 00007f1649c5ee29
[  190.039624] RDX: 0000000000000000 RSI: 000000000192c0a0 RDI: 0000000000000003
[  190.039624] RBP: 000000000192c0a0 R08: 0000000000000000 R09: 000000000000000f
[  190.039624] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[  190.039625] R13: 000000000192bc60 R14: 0000000000040000 R15: 000000000192b910
[  190.039625] Code: fe ff ff 48 89 31 e9 f7 fe ff ff 0f 1f 44 00 00 48 8b 47 38 48 83 c7 38 48 39 c7 75 02 f3 c3 48 c7 c7 70 f6 c2 8b e8 35 a4 b7 ff <0f> ff c3 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 
[  190.039638] ---[ end trace 6ae74b00c25ed622 ]---
[  190.039641] [TTM] Finalizing pool allocator
[  190.039642] [TTM] Finalizing DMA pool allocator
[  190.039646] BUG: scheduling while atomic: modprobe/1592/0x00000000
[  190.039646] Modules linked in: amdgpu(+) chash ttm cls_u32 sch_htb af_packet nft_limit nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_netdev nf_tables_inet nf_tables_ipv6 nf_tables_ipv4 dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio libcrc32c bcache raid0 x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net raid1 tun vhost tap kvm snd_hda_codec_realtek md_mod snd_hda_codec_generic snd_hda_codec_hdmi irqbypass snd_hda_intel intel_cstate snd_hda_codec intel_uncore intel_rapl_perf snd_hwdep efi_pstore snd_hda_core mei_me plusb usbnet snd_pcm input_leds
[  190.039663]  mei mii led_class efivars tpm_crb usbip_host usbip_core efivarfs algif_skcipher af_alg mousedev joydev psmouse crct10dif_pclmul atkbd crc32_pclmul libps2 pcspkr ghash_clmulni_intel shpchp thermal tpm_tis tpm_tis_core fan tpm i8042 acpi_pad vfat fat [last unloaded: amdgpu]
[  190.039672] CPU: 1 PID: 1592 Comm: modprobe Tainted: G        W        4.15.0-1-rc #2
[  190.039673] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[  190.039673] Call Trace:
[  190.039675]  dump_stack+0x46/0x65
[  190.039677]  __schedule_bug+0x47/0x60
[  190.039677]  __schedule+0x531/0x6d0
[  190.039679]  ? enqueue_entity+0x149/0x820
[  190.039680]  ? sched_clock_cpu+0xc/0xc0
[  190.039680]  schedule+0x2a/0x80
[  190.039681]  schedule_timeout+0x1ef/0x2b0
[  190.039682]  wait_for_common+0x11c/0x1d0
[  190.039683]  ? wake_up_q+0x70/0x70
[  190.039684]  kthread_stop+0x38/0x60
[  190.039685]  destroy_workqueue+0x114/0x180
[  190.039687]  ttm_mem_global_release+0x21/0x80 [ttm]
[  190.039688]  drm_global_item_unref+0x3f/0x60
[  190.039703]  amdgpu_ttm_fini+0x179/0x1e0 [amdgpu]
[  190.039718]  amdgpu_bo_fini+0x9/0x30 [amdgpu]
[  190.039735]  gmc_v8_0_sw_fini+0x29/0x50 [amdgpu]
[  190.039749]  amdgpu_fini+0x1f3/0x2f0 [amdgpu]
[  190.039763]  amdgpu_device_init+0xda2/0x14d0 [amdgpu]
[  190.039764]  ? kernfs_activate+0x5e/0x80
[  190.039765]  ? kernfs_add_one+0xdf/0x130
[  190.039779]  amdgpu_driver_load_kms+0x43/0x1a0 [amdgpu]
[  190.039780]  drm_dev_register+0x12a/0x1b0
[  190.039793]  amdgpu_pci_probe+0x104/0x140 [amdgpu]
[  190.039794]  pci_device_probe+0xc3/0x140
[  190.039795]  driver_probe_device+0x307/0x470
[  190.039796]  __driver_attach+0x98/0xe0
[  190.039797]  ? driver_probe_device+0x470/0x470
[  190.039798]  bus_for_each_dev+0x64/0xb0
[  190.039800]  bus_add_driver+0x1b8/0x260
[  190.039800]  ? 0xffffffffc084d000
[  190.039801]  driver_register+0x52/0xc0
[  190.039801]  ? 0xffffffffc084d000
[  190.039802]  do_one_initcall+0x46/0x180
[  190.039802]  do_init_module+0x51/0x1c9
[  190.039804]  load_module+0x23c4/0x2a30
[  190.039805]  ? SyS_finit_module+0xea/0x110
[  190.039806]  SyS_finit_module+0xea/0x110
[  190.039807]  do_syscall_64+0x48/0xe0
[  190.039808]  entry_SYSCALL64_slow_path+0x25/0x25
[  190.039809] RIP: 0033:0x7f1649c5ee29
[  190.039809] RSP: 002b:00007ffc877e3828 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[  190.039810] RAX: ffffffffffffffda RBX: 000000000192bbc0 RCX: 00007f1649c5ee29
[  190.039810] RDX: 0000000000000000 RSI: 000000000192c0a0 RDI: 0000000000000003
[  190.039810] RBP: 000000000192c0a0 R08: 0000000000000000 R09: 000000000000000f
[  190.039811] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[  190.039811] R13: 000000000192bc60 R14: 0000000000040000 R15: 000000000192b910
[  190.039821] [TTM] Zone  kernel: Used memory at exit: 12 kiB
[  190.039822] [TTM] Zone   dma32: Used memory at exit: 12 kiB
[  190.039823] [drm] amdgpu: ttm finalized
[  190.039827] amdgpu 0000:01:00.0: Fatal error during GPU init
[  190.039829] [drm] amdgpu: finishing device.
[  190.039829] [TTM] Memory type 2 has not been initialized

[  190.040053] amdgpu: probe of 0000:01:00.0 failed with error -22
Comment 4 Mikita Lipski 2017-12-21 19:41:28 UTC
Encountering a deadlock in DRM while trying to force disable CRTCs.
If no displays connected - system hard hangs.
If DC is disabled (any number of displays) - system hard hangs.

Currently investigating the deadlock issue.
Thanks
Comment 5 Harry Wentland 2018-01-29 16:16:52 UTC
Can you try with the latest amd-staging-drm-next from https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

We just fixed a bunch of driver unload issues. It should be fixed now.
Comment 6 Luke McKee 2018-02-22 19:28:20 UTC
Harry I have tried the latest staging next 2 days ago.

Suggest you see there is a long standing issue with powerplay and buggy AMI bioses that don't properly set up MMIO BAR regiions that need to be worked around by your driver becuase the vendor says a 2 year old motherboard is too old to support firmware updates, even if Intel ME is a massive security risk in support emails. I'm actually thinking you guys need to add linuxbios/coreboot support for your driver if available the way things are going:

https://forum-en.msi.com/index.php?topic=298468.0

I think Polaris 11 with buggy bioses that dont' properly setup PCIE MMIO BAR ranges have hell with the amdgpu driver and powerplay (no fan control) cooked cards etc. With your new powerplay code it gets even worse maybe and just doesn't boot in this condition. I work OK on 4.14.20 but with exactly the same config and amdgpu-staging-next from last updated 4 days ago I get this mess. 

Feb 23 01:41:59 hojuruku kernel: [drm] amdgpu kernel modesetting enabled.
Feb 23 01:41:59 hojuruku kernel: checking generic (e0000000 300000) vs hw (e0000000 10000000)
Feb 23 01:41:59 hojuruku kernel: fb: switching to amdgpudrmfb from EFI VGA
Feb 23 01:41:59 hojuruku kernel: Console: switching to colour dummy device 80x25
Feb 23 01:41:59 hojuruku kernel: amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
Feb 23 01:41:59 hojuruku kernel: [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1462:0x8A91 0xCF).
Feb 23 01:41:59 hojuruku kernel: [drm] register mmio base: 0xF7E00000
Feb 23 01:41:59 hojuruku kernel: [drm] register mmio size: 262144
Feb 23 01:41:59 hojuruku kernel: [drm] add ip block number 0 <vi_common>
Feb 23 01:41:59 hojuruku kernel: [drm] add ip block number 1 <gmc_v8_0>
Feb 23 01:41:59 hojuruku kernel: [drm] add ip block number 2 <tonga_ih>
Feb 23 01:41:59 hojuruku kernel: [drm] add ip block number 3 <amdgpu_powerplay>
Feb 23 01:41:59 hojuruku kernel: [drm] add ip block number 4 <dce_v11_0>
Feb 23 01:41:59 hojuruku kernel: [drm] add ip block number 5 <gfx_v8_0>
Feb 23 01:41:59 hojuruku kernel: [drm] add ip block number 6 <sdma_v3_0>
Feb 23 01:41:59 hojuruku kernel: [drm] add ip block number 7 <uvd_v6_0>
Feb 23 01:41:59 hojuruku kernel: [drm] add ip block number 8 <vce_v3_0>
Feb 23 01:41:59 hojuruku kernel: [drm] probing gen 2 caps for device 8086:c01 = 261ad03/e
Feb 23 01:41:59 hojuruku kernel: [drm] probing mlw for device 8086:c01 = 261ad03
Feb 23 01:41:59 hojuruku kernel: [drm] UVD is enabled in VM mode
Feb 23 01:41:59 hojuruku kernel: [drm] UVD ENC is enabled in VM mode
Feb 23 01:41:59 hojuruku kernel: [drm] VCE enabled in VM mode
Feb 23 01:41:59 hojuruku kernel: ATOM BIOS: 113-C98121-M01
Feb 23 01:41:59 hojuruku kernel: [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
Feb 23 01:41:59 hojuruku kernel: amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
Feb 23 01:41:59 hojuruku kernel: amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
Feb 23 01:41:59 hojuruku kernel: [drm] Detected VRAM RAM=4096M, BAR=256M
Feb 23 01:41:59 hojuruku kernel: [drm] RAM width 128bits GDDR5
Feb 23 01:41:59 hojuruku kernel: [TTM] Zone  kernel: Available graphics memory: 8174838 kiB
Feb 23 01:41:59 hojuruku kernel: [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
Feb 23 01:41:59 hojuruku kernel: [TTM] Initializing pool allocator
Feb 23 01:41:59 hojuruku kernel: [TTM] Initializing DMA pool allocator
Feb 23 01:41:59 hojuruku kernel: [drm] amdgpu: 4096M of VRAM memory ready
Feb 23 01:41:59 hojuruku kernel: [drm] amdgpu: 4096M of GTT memory ready.
Feb 23 01:41:59 hojuruku kernel: [drm] GART: num cpu pages 65536, num gpu pages 65536
Feb 23 01:41:59 hojuruku kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
Feb 23 01:41:59 hojuruku kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Feb 23 01:41:59 hojuruku kernel: [drm] Driver supports precise vblank timestamp query.
Feb 23 01:41:59 hojuruku kernel: [drm] AMDGPU Display Connectors
Feb 23 01:41:59 hojuruku kernel: [drm] Connector 0:
Feb 23 01:41:59 hojuruku kernel: [drm]   DP-1
Feb 23 01:41:59 hojuruku kernel: [drm]   HPD2
Feb 23 01:41:59 hojuruku kernel: [drm]   DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b
Feb 23 01:41:59 hojuruku kernel: [drm]   Encoders:
Feb 23 01:41:59 hojuruku kernel: [drm]     DFP1: INTERNAL_UNIPHY1
Feb 23 01:41:59 hojuruku kernel: [drm] Connector 1:
Feb 23 01:41:59 hojuruku kernel: [drm]   HDMI-A-1
Feb 23 01:41:59 hojuruku kernel: [drm]   HPD5
Feb 23 01:41:59 hojuruku kernel: [drm]   DDC: 0x4874 0x4874 0x4875 0x4875 0x4876 0x4876 0x4877 0x4877
Feb 23 01:41:59 hojuruku kernel: [drm]   Encoders:
Feb 23 01:41:59 hojuruku kernel: [drm]     DFP2: INTERNAL_UNIPHY1
Feb 23 01:41:59 hojuruku kernel: [drm] Connector 2:
Feb 23 01:41:59 hojuruku kernel: [drm]   DVI-D-1
Feb 23 01:41:59 hojuruku kernel: [drm]   HPD3
Feb 23 01:41:59 hojuruku kernel: [drm]   DDC: 0x4878 0x4878 0x4879 0x4879 0x487a 0x487a 0x487b 0x487b
Feb 23 01:41:59 hojuruku kernel: [drm]   Encoders:
Feb 23 01:41:59 hojuruku kernel: [drm]     DFP3: INTERNAL_UNIPHY
Feb 23 01:41:59 hojuruku kernel: [drm] Chained IB support enabled!


This is the issues I've had with it:
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: (II) AMDGPU(0): Number of EDID sections to follow: 1
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: (II) AMDGPU(0): EDID (in hex):
Feb 23 01:42:00 hojuruku kernel: WARNING: CPU: 3 PID: 816 at drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c:326 amdgpu_sa_bo_new+0x463/0x480 [amdgpu]
Feb 23 01:42:00 hojuruku kernel: Modules linked in: snd_hda_codec_realtek snd_hda_codec_generic cmdlinepart amdgpu(+) chash i2c_algo_bit gpu_sched snd_hda_intel(+) snd_hda_codec ttm drm_kms_helper snd_hwdep intel_spi_platform intel_spi drm spi_nor snd_hda_core ehci_pci mtd agpgart snd_pcm syscopyarea sysfillrect ehci_hcd snd_timer sysimgblt fb_sys_fops snd soundcore xhci_pci xhci_hcd
Feb 23 01:42:00 hojuruku kernel: CPU: 3 PID: 816 Comm: X Tainted: G        W        4.15.0-rc4-haswell+ #1
Feb 23 01:42:00 hojuruku kernel: Hardware name: MSI MS-7850/B85-G41 PC Mate(MS-7850), BIOS V2.10B3 02/18/2016
Feb 23 01:42:00 hojuruku kernel: RIP: 0010:amdgpu_sa_bo_new+0x463/0x480 [amdgpu]
Feb 23 01:42:00 hojuruku kernel: RSP: 0018:ffffc900025938d0 EFLAGS: 00010287
Feb 23 01:42:00 hojuruku kernel: RAX: ffff8803fd82ac00 RBX: ffff880406520000 RCX: 0000000000000100
Feb 23 01:42:00 hojuruku kernel: RDX: 0000000000000040 RSI: ffff8803fd82ae60 RDI: ffff880406523480
Feb 23 01:42:00 hojuruku kernel: RBP: ffff8803fd82ae60 R08: ffff88041dd9c600 R09: ffff8803fd82ac00
Feb 23 01:42:00 hojuruku kernel: R10: ffffc90002593aa8 R11: ffff880404c98090 R12: 0000000000000000
Feb 23 01:42:00 hojuruku kernel: R13: 0000000000000000 R14: 0000000000000400 R15: ffff880404c98000
Feb 23 01:42:00 hojuruku kernel: FS:  00007f491dd2f300(0000) GS:ffff88041dd80000(0000) knlGS:0000000000000000
Feb 23 01:42:00 hojuruku kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 23 01:42:00 hojuruku kernel: CR2: 000055b6b5d201b8 CR3: 0000000404352006 CR4: 00000000001606e0
Feb 23 01:42:00 hojuruku kernel: Call Trace:
Feb 23 01:42:00 hojuruku kernel:  ? ttm_bo_handle_move_mem+0x28c/0x5b0 [ttm]
Feb 23 01:42:00 hojuruku kernel:  ? ___slab_alloc+0x416/0x5b0
Feb 23 01:42:00 hojuruku kernel:  ? amdgpu_vram_mgr_new+0x1d9/0x2a0 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  ? ttm_bo_validate+0x104/0x110 [ttm]
Feb 23 01:42:00 hojuruku kernel:  ? security_capable+0x4f/0x70
Feb 23 01:42:00 hojuruku kernel:  ? amdgpu_job_alloc+0x45/0xc0 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  ? ttm_bo_init_reserved+0x2d4/0x450 [ttm]
Feb 23 01:42:00 hojuruku kernel:  ? amdgpu_job_alloc+0x45/0xc0 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  ? __slab_alloc+0x2a/0x40
Feb 23 01:42:00 hojuruku kernel:  amdgpu_ib_get+0x3b/0xa0 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  amdgpu_job_alloc_with_ib+0x50/0x90 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  amdgpu_vm_clear_bo+0xe4/0x2b0 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  amdgpu_vm_alloc_levels+0x1f6/0x350 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  amdgpu_vm_alloc_pts+0x5b/0x90 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  amdgpu_gem_va_ioctl+0x27c/0x510 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  ? amdgpu_gem_create_ioctl+0x180/0x250 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  ? amdgpu_gem_metadata_ioctl+0x1b0/0x1b0 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  ? drm_ioctl_kernel+0x63/0xb0 [drm]
Feb 23 01:42:00 hojuruku kernel:  drm_ioctl_kernel+0x63/0xb0 [drm]
Feb 23 01:42:00 hojuruku kernel:  drm_ioctl+0x2dc/0x380 [drm]
Feb 23 01:42:00 hojuruku kernel:  ? amdgpu_gem_metadata_ioctl+0x1b0/0x1b0 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  ? __handle_mm_fault+0x875/0xf50
Feb 23 01:42:00 hojuruku kernel:  amdgpu_drm_ioctl+0x57/0x90 [amdgpu]
Feb 23 01:42:00 hojuruku kernel:  do_vfs_ioctl+0x97/0x5e0
Feb 23 01:42:00 hojuruku kernel:  ? handle_mm_fault+0xd2/0x1a0
Feb 23 01:42:00 hojuruku kernel:  ? __do_page_fault+0x223/0x3f0
Feb 23 01:42:00 hojuruku kernel:  SyS_ioctl+0x7e/0x90
Feb 23 01:42:00 hojuruku kernel:  entry_SYSCALL_64_fastpath+0x1a/0x7d
Feb 23 01:42:00 hojuruku kernel: RIP: 0033:0x7f491efaa6f7
Feb 23 01:42:00 hojuruku kernel: RSP: 002b:00007fff8e587648 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Feb 23 01:42:00 hojuruku kernel: RAX: ffffffffffffffda RBX: 000055b6b5d1fcb0 RCX: 00007f491efaa6f7
Feb 23 01:42:00 hojuruku kernel: RDX: 00007fff8e5876b0 RSI: 00000000c0286448 RDI: 000000000000000e
Feb 23 01:42:00 hojuruku kernel: RBP: 00007f491f2701f8 R08: 0000000100200000 R09: 000000000000000e
Feb 23 01:42:00 hojuruku kernel: R10: 000000000000000c R11: 0000000000000246 R12: 00007f491f26fad8
Feb 23 01:42:00 hojuruku kernel: R13: 00000000000004f0 R14: 00007f491f26fa80 R15: 00000000000052e0
Feb 23 01:42:00 hojuruku kernel: Code: 00 00 00 e9 21 ff ff ff 8b 54 24 1c 8b 74 24 18 48 8b 3c 24 e8 0f f7 ff ff 84 c0 74 9a eb c2 0f ff bb ea ff ff ff e9 fc fc ff ff <0f> ff bb ea ff ff ff e9 f0 fc ff ff bb f4 ff ff ff e9 e6 fc ff 
Feb 23 01:42:00 hojuruku kernel: ---[ end trace de335683cd4d1a49 ]---
Feb 23 01:42:00 hojuruku kernel: amdgpu 0000:01:00.0: failed to get a new IB (-22)
Feb 23 01:42:00 hojuruku kernel: amdgpu 0000:01:00.0: failed to get a new IB (-22)
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: Failed to allocate front buffer memory
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: (EE) AMDGPU(0): amdgpu_setup_kernel_mem failed
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: (EE)
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: Fatal server error:
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: (EE) AddScreen/ScreenInit failed for driver 0
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: (EE)
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: Please consult the The X.Org Foundation support
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]:          at http://wiki.x.org
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]:  for help.
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: (EE)
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: (EE) Server terminated with error (1). Closing log file.
Feb 23 01:42:00 hojuruku /usr/libexec/gdm-x-session[814]: Unable to run X server

then a systemd loop with

Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]: (II) UnloadModule: "modesetting"
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]: (EE) Device(s) detected, but none match those in the config file.
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]: (EE)
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]: Fatal server error:
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]: (EE) no screens found(EE)
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]: (EE)
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]: Please consult the The X.Org Foundation support
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]:          at http://wiki.x.org
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]:  for help.
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]: (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
Feb 23 01:42:07 hojuruku /usr/libexec/gdm-x-session[1070]: (EE)
Feb 23 01:42:07 hojuruku kernel: amdgpu 0000:01:00.0: failed to get a new IB (-22)
Feb 23 01:42:07 hojuruku kernel: amdgpu 0000:01:00.0: failed to get a new IB (-22)
Comment 7 Harry Wentland 2018-02-22 20:50:25 UTC
Luke, does your comment relate to driver unload, specifically driver unload throwing NULL pointer dereference?

If not, please open a separate ticket so as not to confuse two issues.
Comment 8 Luke McKee 2018-02-25 05:44:44 UTC
I commented because the user here had the error relating to powerplay failing that I have seen in other threads.

https://forum-en.msi.com/index.php?topic=298468.0

[46667.236028] kernel: amdgpu: [powerplay] 
                        failed to send message 309 ret is 254 
[46667.236041] kernel: amdgpu: [powerplay] 
                        failed to send pre message 14e ret is 254 

Thought this issue might be related to the motherboard bios not properly setting up MMIO BARs.

I'll open a seperate ticket i a week if the Polaris 11 support is still broken after your next push to linux 4.16. I'm sure you at AMD would be aware of it.
Comment 9 Luke McKee 2018-02-26 08:09:21 UTC
Using Polairs 11 I get the same error message btw when loading the latest (today) amd-drm-staging-next. I'll include debugging symbols and try again.

Feb 26 14:56:18 hojuruku kernel: Linux agpgart interface v0.103
Feb 26 14:56:18 hojuruku kernel: [drm] amdgpu kernel modesetting enabled.
Feb 26 14:56:18 hojuruku kernel: checking generic (e0000000 300000) vs hw (e0000000 10000000)
Feb 26 14:56:18 hojuruku kernel: fb: switching to amdgpudrmfb from EFI VGA
Feb 26 14:56:18 hojuruku kernel: Console: switching to colour dummy device 80x25
Feb 26 14:56:18 hojuruku kernel: amdgpu 0000:01:00.0: enabling device (0006 -> 0007)
Feb 26 14:56:18 hojuruku kernel: [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1462:0x8A91 0xCF).
Feb 26 14:56:18 hojuruku kernel: [drm] register mmio base: 0xF7E00000
Feb 26 14:56:18 hojuruku kernel: [drm] register mmio size: 262144
Feb 26 14:56:18 hojuruku kernel: [drm] add ip block number 0 <vi_common>
Feb 26 14:56:18 hojuruku kernel: [drm] add ip block number 1 <gmc_v8_0>
Feb 26 14:56:18 hojuruku kernel: [drm] add ip block number 2 <tonga_ih>
Feb 26 14:56:18 hojuruku kernel: [drm] add ip block number 3 <amdgpu_powerplay>
Feb 26 14:56:18 hojuruku kernel: [drm] add ip block number 4 <dce_v11_0>
Feb 26 14:56:18 hojuruku kernel: [drm] add ip block number 5 <gfx_v8_0>
Feb 26 14:56:18 hojuruku kernel: [drm] add ip block number 6 <sdma_v3_0>
Feb 26 14:56:18 hojuruku kernel: [drm] add ip block number 7 <uvd_v6_0>
Feb 26 14:56:18 hojuruku kernel: [drm] add ip block number 8 <vce_v3_0>
Feb 26 14:56:18 hojuruku kernel: [drm] probing gen 2 caps for device 8086:c01 = 261ad03/e
Feb 26 14:56:18 hojuruku kernel: [drm] probing mlw for device 8086:c01 = 261ad03
Feb 26 14:56:18 hojuruku kernel: [drm] UVD is enabled in VM mode
Feb 26 14:56:18 hojuruku kernel: [drm] UVD ENC is enabled in VM mode
Feb 26 14:56:18 hojuruku kernel: [drm] VCE enabled in VM mode
Feb 26 14:56:18 hojuruku kernel: ATOM BIOS: 113-C98121-M01
Feb 26 14:56:18 hojuruku kernel: [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
Feb 26 14:56:18 hojuruku kernel: amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
Feb 26 14:56:18 hojuruku kernel: amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
Feb 26 14:56:18 hojuruku kernel: [drm] Detected VRAM RAM=4096M, BAR=256M
Feb 26 14:56:18 hojuruku kernel: [drm] RAM width 128bits GDDR5
Feb 26 14:56:18 hojuruku kernel: [TTM] Zone  kernel: Available graphics memory: 8175204 kiB
Feb 26 14:56:18 hojuruku kernel: [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
Feb 26 14:56:18 hojuruku kernel: [TTM] Initializing pool allocator
Feb 26 14:56:18 hojuruku kernel: [TTM] Initializing DMA pool allocator
Feb 26 14:56:18 hojuruku kernel: [drm] amdgpu: 4096M of VRAM memory ready
Feb 26 14:56:18 hojuruku kernel: [drm] amdgpu: 4096M of GTT memory ready.
Feb 26 14:56:18 hojuruku kernel: [drm] GART: num cpu pages 65536, num gpu pages 65536
Feb 26 14:56:18 hojuruku kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
Feb 26 14:56:18 hojuruku kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Feb 26 14:56:18 hojuruku kernel: [drm] Driver supports precise vblank timestamp query.
Feb 26 14:56:18 hojuruku kernel: [drm] AMDGPU Display Connectors
Feb 26 14:56:18 hojuruku kernel: [drm] Connector 0:
Feb 26 14:56:18 hojuruku kernel: [drm]   DP-1
Feb 26 14:56:18 hojuruku kernel: [drm]   HPD2
Feb 26 14:56:18 hojuruku kernel: [drm]   DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b
Feb 26 14:56:18 hojuruku kernel: [drm]   Encoders:
Feb 26 14:56:18 hojuruku kernel: [drm]     DFP1: INTERNAL_UNIPHY1
Feb 26 14:56:18 hojuruku kernel: [drm] Connector 1:
Feb 26 14:56:18 hojuruku kernel: [drm]   HDMI-A-1
Feb 26 14:56:18 hojuruku kernel: [drm]   HPD5
Feb 26 14:56:18 hojuruku kernel: [drm]   DDC: 0x4874 0x4874 0x4875 0x4875 0x4876 0x4876 0x4877 0x4877
Feb 26 14:56:18 hojuruku kernel: [drm]   Encoders:
Feb 26 14:56:18 hojuruku kernel: [drm]     DFP2: INTERNAL_UNIPHY1
Feb 26 14:56:18 hojuruku kernel: [drm] Connector 2:
Feb 26 14:56:18 hojuruku kernel: [drm]   DVI-D-1
Feb 26 14:56:18 hojuruku kernel: [drm]   HPD3
Feb 26 14:56:18 hojuruku kernel: [drm]   DDC: 0x4878 0x4878 0x4879 0x4879 0x487a 0x487a 0x487b 0x487b
Feb 26 14:56:18 hojuruku kernel: [drm]   Encoders:
Feb 26 14:56:18 hojuruku kernel: [drm]     DFP3: INTERNAL_UNIPHY
Feb 26 14:56:18 hojuruku kernel: [drm] Chained IB support enabled!
Feb 26 14:56:18 hojuruku kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16
Feb 26 14:56:18 hojuruku kernel: [drm] Found VCE firmware Version: 52.4 Binary ID: 3
Feb 26 14:56:18 hojuruku kernel: amdgpu: [powerplay] 
                                  failed to send message 309 ret is 254 
Feb 26 14:56:18 hojuruku kernel: amdgpu: [powerplay] 
                                  failed to send pre message 14e ret is 254 
Feb 26 14:56:18 hojuruku kernel: [drm] UVD and UVD ENC initialized successfully.
Feb 26 14:56:18 hojuruku kernel: [drm] VCE initialized successfully.
Feb 26 14:56:19 hojuruku kernel: [drm] fb mappable at 0xE0568000
Feb 26 14:56:19 hojuruku kernel: [drm] vram apper at 0xE0000000
Feb 26 14:56:19 hojuruku kernel: [drm] size 8294400
Feb 26 14:56:19 hojuruku kernel: [drm] fb depth is 24
Feb 26 14:56:19 hojuruku kernel: [drm]    pitch is 7680
Feb 26 14:56:19 hojuruku kernel: fbcon: amdgpudrmfb (fb0) is primary device
Feb 26 14:56:21 hojuruku kernel: Console: switching to colour frame buffer device 240x67
Feb 26 14:56:21 hojuruku kernel: amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
Feb 26 14:56:22 hojuruku kernel: [drm] Initialized amdgpu 3.25.0 20150101 for 0000:01:00.0 on minor 0
Feb 26 14:56:22 hojuruku kernel: [drm] amdgpu: finishing device.
Feb 26 14:56:22 hojuruku kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Feb 26 14:56:22 hojuruku kernel: IP: 0xffffffffa01f70a6
Feb 26 14:56:22 hojuruku kernel: PGD 0 P4D 0 
Feb 26 14:56:22 hojuruku kernel: Oops: 0000 [#1] SMP
Feb 26 14:56:22 hojuruku kernel: Modules linked in: amdgpu(+) chash i2c_algo_bit gpu_sched ttm drm_kms_helper drm agpgart syscopyarea sysfillrect sysimgblt fb_sys_fops snd_hda_codec_hdmi snd_hda_codec_realtek s>
Feb 26 14:56:22 hojuruku kernel: CPU: 0 PID: 31 Comm: kworker/0:1 Tainted: G        W        4.15.0-rc4-haswell+ #1
Feb 26 14:56:22 hojuruku kernel: Hardware name: MSI MS-7850/B85-G41 PC Mate(MS-7850), BIOS V2.10B3 02/18/2016
Feb 26 14:56:22 hojuruku kernel: Workqueue: events 0xffffffffa01abe10
Feb 26 14:56:22 hojuruku kernel: RIP: 0010:0xffffffffa01f70a6
Feb 26 14:56:22 hojuruku kernel: RSP: 0018:ffffc900019e3a48 EFLAGS: 00010202
Feb 26 14:56:22 hojuruku kernel: RAX: 0000000000000008 RBX: ffff88040c2f33e8 RCX: 0000000000000000
Feb 26 14:56:22 hojuruku kernel: RDX: 0000000000000000 RSI: 0000000000004332 RDI: ffff88040c2f4c00
Feb 26 14:56:22 hojuruku kernel: RBP: ffff88040c2f3470 R08: 0000000000000002 R09: 0000000000000000
Feb 26 14:56:22 hojuruku kernel: R10: 0000000000000000 R11: 342da7f2f4960343 R12: ffff88040c2f0000
Feb 26 14:56:22 hojuruku kernel: R13: ffff88040d288000 R14: ffff88040d288000 R15: ffffffffa035c240
Feb 26 14:56:22 hojuruku kernel: FS:  0000000000000000(0000) GS:ffff88041dc00000(0000) knlGS:0000000000000000
Feb 26 14:56:22 hojuruku kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 26 14:56:22 hojuruku kernel: CR2: 0000000000000008 CR3: 000000000440a001 CR4: 00000000001606f0
Feb 26 14:56:22 hojuruku kernel: Call Trace:
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa0208575
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa01a9884
Feb 26 14:56:22 hojuruku kernel:  ? 0xffffffffa01aa4a1
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa01aa4a1
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa01fce38
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa015ba3f
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa01ba43e
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa01bc7b3
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa01bc837
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa01bc732
Feb 26 14:56:22 hojuruku kernel:  0xffffffffa01abf95
Feb 26 14:56:22 hojuruku kernel:  0xffffffff81137131
Feb 26 14:56:22 hojuruku kernel:  0xffffffff81137511
Feb 26 14:56:22 hojuruku kernel:  ? 0xffffffff811372c0
Feb 26 14:56:22 hojuruku kernel:  0xffffffff8113c8e6
Feb 26 14:56:22 hojuruku kernel:  ? 0xffffffff8113c7d0
Feb 26 14:56:22 hojuruku kernel:  0xffffffff81c1c97f
Feb 26 14:56:22 hojuruku kernel: Code: 00 00 00 00 00 8b 47 20 85 c0 74 50 53 48 81 ec 28 10 00 00 48 83 0c 24 00 48 81 c4 20 10 00 00 23 47 68 48 8b 57 70 48 8d 04 c2 <48> 8b 18 48 85 db 74 21 8b 13 85 d2 74 1>
Feb 26 14:56:22 hojuruku kernel: RIP: 0xffffffffa01f70a6 RSP: ffffc900019e3a48
Feb 26 14:56:22 hojuruku kernel: CR2: 0000000000000008
Feb 26 14:56:22 hojuruku kernel: ---[ end trace a46140331fdc8070 ]---
Comment 10 Jordan L 2018-02-26 21:46:14 UTC
Hi Luke, it actually looks like you're running with DC disabled. Can you also try with amdgpu.dc=1 explicitly set? Potentially there are issues in multiple IP blocks, though we fixed a driver unload issue recently with DC enabled.


Thanks
Comment 11 Luke McKee 2018-02-28 02:01:35 UTC
I fixed this. I was going to open another ticket. 
Just mentioned it before.

Still dc=1 isn't yet usable due to this:
https://bugs.freedesktop.org/show_bug.cgi?id=103953#c7

I found out that the breakage was occurring due to kernel configuration options. I can provide a working and broken kconfig, but my money is on it wanting the amd iommu gart support even though the hardware isn't installed. This is loaded on demand by the amdgpu module. Disabling agpart didn't do much harm though. Would that affect performance if intel_iommu is enabled?

Feb 28 08:08:35 hojuruku kernel: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
Feb 28 08:08:35 hojuruku kernel: AMD IOMMUv2 functionality not available on this system
Feb 28 08:08:35 hojuruku kernel: CRAT table not found
Feb 28 08:08:35 hojuruku kernel: Virtual CRAT table created for CPU
Feb 28 08:08:35 hojuruku kernel: Parsing CRAT table with 1 nodes


If users sensibly choose what hardware they have installed to reduce compile times, or build monolithic kernels they are going to run into trouble here.

Ask me for the .config's for the kernel if you need them to replicate the defect if you need them. That's what triggered the error in my last comment I think though in that kernel I had ORC off and no symbols.

You might want to review / tweak your Kconfig depends clauses.
Comment 12 Jordan L 2018-02-28 17:26:54 UTC
Thanks. Just to clarify, you aren't able to enable DC with your current Kconfig configuration, which prevents you from retrying on this ticket?

If so, please just open a new bug rather than conflate the issue here. Once you're unblocked from testing DC with your configuration, we can look at this issue again. 

Cheers
Comment 13 Sverd Johnsen 2018-04-05 13:12:28 UTC
Still seen with 4.15.15 and dc=1. Not sure if this always reproduces or not, havn't tested this in a while.

[13342.285357] [drm] amdgpu: finishing device.
[13342.288330] amdgpu: [powerplay] 
                failed to send message 30a ret is 254 
[13342.288345] amdgpu: [powerplay] 
                failed to send pre message 26b ret is 254 
[13342.369191] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[13342.369193] IP:           (null)
[13342.369194] PGD 80000003eed8a067 P4D 80000003eed8a067 PUD 3f027a067 PMD 0 
[13342.369197] Oops: 0010 [#1] PREEMPT SMP PTI
[13342.369198] Modules linked in: nfnetlink_log bluetooth ecdh_generic amdgpu(-) chash ttm af_packet macvtap macvlan bonding nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_ipv6 nf_log_ipv4 nf_log_common nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_log nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree nf_tables_inet cls_u32 nf_tables_ipv6 nf_tables_ipv4 sch_htb dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio libcrc32c raid0 x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net tun vhost tap kvm md_mod snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel intel_cstate intel_uncore snd_hda_codec intel_rapl_perf efi_pstore snd_hwdep snd_hda_core mei_me plusb snd_pcm
[13342.369219]  input_leds usbnet mei led_class mii efivars tpm_crb crypto_user efivarfs algif_skcipher af_alg joydev mousedev psmouse atkbd libps2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr tpm_tis tpm_tis_core shpchp thermal fan tpm i8042 acpi_pad vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[13342.369230] CPU: 1 PID: 18251 Comm: rmmod Not tainted 4.15.15-5-ph #2
[13342.369232] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[13342.369233] RIP: 0010:          (null)
[13342.369234] RSP: 0018:ffffb40e025e7d00 EFLAGS: 00010282
[13342.369235] RAX: 0000000000000000 RBX: ffff8ee9c9c1b420 RCX: 0000000180200011
[13342.369236] RDX: 0000000180200012 RSI: 0000000000005c02 RDI: ffff8ee989680f60
[13342.369236] RBP: ffff8ee9ac57da90 R08: 0000000000000001 R09: ffff8ee982daae00
[13342.369237] R10: ffff8ee9ccc01900 R11: 0000000000023610 R12: 0000000000000003
[13342.369238] R13: ffff8ee9ca032f18 R14: 0000000000000000 R15: 0000000000000000
[13342.369239] FS:  00007f738c970b80(0000) GS:ffff8ee9dec80000(0000) knlGS:0000000000000000
[13342.369240] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13342.369241] CR2: 0000000000000000 CR3: 000000033ee74001 CR4: 00000000003606e0
[13342.369241] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13342.369242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13342.369243] Call Trace:
[13342.369258]  ? destroy+0x23/0xb0 [amdgpu]
[13342.369269]  ? dal_i2caux_destruct+0x6a/0xb0 [amdgpu]
[13342.369278]  ? destroy+0x10/0x30 [amdgpu]
[13342.369288]  ? dal_i2caux_destroy+0x1d/0x30 [amdgpu]
[13342.369297]  ? destruct+0x89/0x110 [amdgpu]
[13342.369306]  ? dc_destroy+0xc/0x20 [amdgpu]
[13342.369318]  ? dm_hw_fini+0x19/0x20 [amdgpu]
[13342.369323]  ? amdgpu_fini+0x9c/0x310 [amdgpu]
[13342.369329]  ? amdgpu_device_fini+0x5f/0x1c0 [amdgpu]
[13342.369334]  ? amdgpu_driver_unload_kms+0x45/0x90 [amdgpu]
[13342.369336]  ? drm_dev_unregister+0x3a/0xe0
[13342.369341]  ? amdgpu_pci_remove+0x14/0x40 [amdgpu]
[13342.369344]  ? pci_device_remove+0x36/0xb0
[13342.369346]  ? device_release_driver_internal+0x155/0x220
[13342.369347]  ? driver_detach+0x32/0x70
[13342.369349]  ? bus_remove_driver+0x4c/0xc0
[13342.369350]  ? pci_unregister_driver+0x24/0x90
[13342.369359]  ? amdgpu_exit+0x11/0x3b6 [amdgpu]
[13342.369361]  ? SyS_delete_module+0x19d/0x230
[13342.369363]  ? do_syscall_64+0x5b/0x100
[13342.369365]  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[13342.369366] Code:  Bad RIP value.
[13342.369368] RIP:           (null) RSP: ffffb40e025e7d00
[13342.369369] CR2: 0000000000000000
[13342.369370] ---[ end trace 9551ca9b94f5680d ]---
Comment 14 Sverd Johnsen 2018-04-05 13:16:01 UTC
I just looked at the comments again and based on Comment 2 this seems like expected behavior for now. So my update was kind of pointless.
Comment 15 Sverd Johnsen 2018-06-05 03:39:16 UTC
Seems to be much better or completly (well almost, bug 106820) solved with 4.17 according to my preliminary tests. Still a problem in 4.16:

[16981.789980] [drm] amdgpu: finishing device.
[16981.901013] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[16981.901017] IP:           (null)
[16981.901018] PGD 0 P4D 0 
[16981.901020] Oops: 0010 [#1] PREEMPT SMP PTI
[16981.901022] Modules linked in: amdgpu(-) chash gpu_sched ttm fuse af_packet macvtap macvlan bonding nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_nat nft_chain_nat_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv6 nf_log_ipv4 nf_log_common nft_log nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_ct nf_conntrack xfrm_user xfrm_algo nft_counter nft_meta nft_set_bitmap nft_set_hash nft_set_rbtree cls_u32 nf_tables_inet sch_htb raid0 intel_pmc_core x86_pkg_temp_thermal intel_powerclamp kvm_intel vhost_net tun vhost tap kvm snd_hda_codec_realtek bcache snd_hda_codec_generic intel_cstate intel_uncore snd_hda_codec_hdmi efi_pstore md_mod snd_hda_intel snd_hda_codec intel_rapl_perf efivars snd_hwdep snd_hda_core mei_me mei plusb snd_pcm input_leds usbnet mii led_class tpm_crb binfmt_misc
[16981.901045]  crypto_user efivarfs algif_skcipher af_alg mousedev joydev psmouse atkbd libps2 tpm_tis crct10dif_pclmul tpm_tis_core crc32_pclmul tpm ghash_clmulni_intel pcspkr rng_core shpchp thermal fan i8042 acpi_pad vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[16981.901055] CPU: 1 PID: 23604 Comm: rmmod Not tainted 4.16.12-5-ph #2
[16981.901056] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[16981.901056] RIP: 0010:          (null)
[16981.901057] RSP: 0018:ffffb8ac03083d08 EFLAGS: 00010286
[16981.901058] RAX: 0000000000000000 RBX: ffffa16af8c39f00 RCX: 000000018020000e
[16981.901059] RDX: 000000018020000f RSI: 0000000000005c02 RDI: ffffa16cc19a8240
[16981.901060] RBP: ffffa16c51c42290 R08: 0000000000000001 R09: ffffa16d0cc01900
[16981.901061] R10: ffffa16b845ef300 R11: ffffe4b3ca133420 R12: ffffa16b89eb4400
[16981.901061] R13: 0000000000000040 R14: ffffffffc09aaf68 R15: dead000000000100
[16981.901062] FS:  00007f3733f98b80(0000) GS:ffffa16d1ec80000(0000) knlGS:0000000000000000
[16981.901063] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16981.901064] CR2: 0000000000000000 CR3: 00000003df3de006 CR4: 00000000003606e0
[16981.901065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[16981.901066] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[16981.901066] Call Trace:
[16981.901087]  ? destroy+0x23/0xb0 [amdgpu]
[16981.901100]  ? dal_i2caux_destruct+0x6a/0xb0 [amdgpu]
[16981.901113]  ? destroy+0x10/0x30 [amdgpu]
[16981.901126]  ? dal_i2caux_destroy+0x1d/0x30 [amdgpu]
[16981.901137]  ? destruct+0x8e/0x110 [amdgpu]
[16981.901148]  ? dc_destroy+0xc/0x20 [amdgpu]
[16981.901162]  ? dm_hw_fini+0x19/0x20 [amdgpu]
[16981.901173]  ? amdgpu_device_ip_fini+0xef/0x30a [amdgpu]
[16981.901184]  ? amdgpu_device_fini+0x68/0x177 [amdgpu]
[16981.901191]  ? amdgpu_driver_unload_kms+0x3d/0x90 [amdgpu]
[16981.901193]  ? drm_dev_unregister+0x3a/0xf0
[16981.901201]  ? amdgpu_pci_remove+0x14/0x40 [amdgpu]
[16981.901203]  ? pci_device_remove+0x36/0xb0
[16981.901205]  ? device_release_driver_internal+0x155/0x220
[16981.901206]  ? driver_detach+0x32/0x63
[16981.901208]  ? bus_remove_driver+0x6f/0xd0
[16981.901209]  ? pci_unregister_driver+0x38/0x90
[16981.901220]  ? amdgpu_exit+0x11/0x1029 [amdgpu]
[16981.901222]  ? SyS_delete_module+0x17f/0x240
[16981.901223]  ? do_syscall_64+0x5b/0x100
[16981.901226]  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[16981.901226] Code:  Bad RIP value.
[16981.901230] RIP:           (null) RSP: ffffb8ac03083d08
[16981.901230] CR2: 0000000000000000
[16981.901232] ---[ end trace f3ff8fc93836e132 ]---
Comment 16 Sverd Johnsen 2018-06-17 06:17:35 UTC
hmm this is new

[41889.542562] Console: switching to colour dummy device 80x25
[41890.859216] [drm] amdgpu: finishing device.
[41891.096266] [TTM] Finalizing pool allocator
[41891.100313] [TTM] Finalizing DMA pool allocator
[41891.100326] [TTM] Zone  kernel: Used memory at exit: 0 kiB
[41891.100327] [TTM] Zone   dma32: Used memory at exit: 0 kiB
[41891.100328] [drm] amdgpu: ttm finalized
[41891.108164] BUG: unable to handle kernel paging request at ffffffffc0a31750
[41891.108167] PGD 2dd20a067 P4D 2dd20a067 PUD 2dd20c067 PMD 4897b7067 PTE 0
[41891.108170] Oops: 0010 [#1] PREEMPT SMP PTI
[41891.108171] Modules linked in: chash gpu_sched ttm arc4 md4 md5 sha512_ssse3 sha512_generic cmac cifs ccm nls_iso8859_1 nls_cp437 vfat fat msr rpcsec_gss_krb5 nfsv4 cachefiles dns_resolver nfs lockd grace fscache auth_rpcgss sunrpc af_packet macvtap macvlan bonding nf_log_ipv6 nf_log_ipv4 nf_log_common nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_log nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nft_limit nft_ct nf_conntrack xfrm_user xfrm_algo cls_u32 nft_counter nft_meta nft_set_bitmap sch_htb nft_set_hash nft_set_rbtree raid0 intel_pmc_core x86_pkg_temp_thermal intel_powerclamp kvm_intel md_mod vhost_net tun vhost tap kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi bcache snd_hda_intel snd_hda_codec snd_hwdep intel_cstate intel_uncore snd_hda_core
[41891.108194]  cdc_ether intel_rapl_perf r8152 efi_pstore snd_pcm plusb mei_me usbnet input_leds mei mii led_class efivars tpm_crb binfmt_misc crypto_user efivarfs algif_skcipher af_alg mousedev joydev psmouse crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr tpm_tis tpm_tis_core tpm shpchp thermal fan i8042 rng_core acpi_pad vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio atkbd libps2 [last unloaded: amdgpu]
[41891.108208] CPU: 0 PID: 28737 Comm: zsh Not tainted 4.17.1-2-ph #2
[41891.108208] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD3/Z170X-UD3-CF, BIOS F23d 12/01/2017
[41891.108210] RIP: 0010:0xffffffffc0a31750
[41891.108211] RSP: 0018:ffff912fdec03f10 EFLAGS: 00010292
[41891.108212] RAX: ffffffffc0a31750 RBX: ffff912fdec20200 RCX: ffff912dbf11f390
[41891.108213] RDX: ffff912f0fd6b990 RSI: ffff912fdec03f20 RDI: ffff912dbf11fd90
[41891.108214] RBP: ffffffffa9223480 R08: ffff912fdec1bc00 R09: 0000000000000100
[41891.108215] R10: 0000000000000080 R11: 00002619991f4040 R12: ffff912fdec20238
[41891.108215] R13: 000000000000000a R14: 7fffffffffffffff R15: 0000000000000202
[41891.108216] FS:  00007febd04cff00(0000) GS:ffff912fdec00000(0000) knlGS:0000000000000000
[41891.108217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41891.108218] CR2: ffffffffc0a31750 CR3: 0000000468368002 CR4: 00000000003606f0
[41891.108219] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[41891.108220] Call Trace:
[41891.108220] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[41891.108222]  <IRQ>
[41891.108224]  ? rcu_process_callbacks+0x1f9/0x3c0
[41891.108226]  ? __do_softirq+0xd0/0x1f4
[41891.108228]  ? irq_exit+0x7c/0xb0
[41891.108229]  ? smp_apic_timer_interrupt+0x59/0x90
[41891.108231]  </IRQ>
[41891.108231]  ? apic_timer_interrupt+0xf/0x20
[41891.108233]  ? privileged_wrt_inode_uidgid+0x12/0x30
[41891.108235]  ? generic_permission+0xf4/0x190
[41891.108236]  ? inode_permission+0x24/0x130
[41891.108237]  ? link_path_walk+0x6c/0x530
[41891.108239]  ? path_lookupat.isra.10+0x92/0x200
[41891.108241]  ? unmap_page_range+0x5ed/0x890
[41891.108242]  ? filename_lookup.part.18+0x9b/0x170
[41891.108244]  ? __check_object_size+0xf6/0x17b
[41891.108246]  ? strncpy_from_user+0x4c/0x170
[41891.108247]  ? vfs_statx+0x6e/0xd0
[41891.108249]  ? __audit_syscall_exit+0x22b/0x2a0
[41891.108250]  ? __se_sys_newstat+0x39/0x70
[41891.108251]  ? syscall_trace_enter+0x1d9/0x240
[41891.108253]  ? vm_munmap+0x64/0x90
[41891.108254]  ? do_syscall_64+0x43/0xf0
[41891.108255]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[41891.108256] Code:  Bad RIP value.
[41891.108259] RIP: 0xffffffffc0a31750 RSP: ffff912fdec03f10
[41891.108260] CR2: ffffffffc0a31750
[41891.108262] ---[ end trace 51bac120fccff9bb ]---
[41891.230170] Kernel panic - not syncing: Fatal exception in interrupt
[41891.230175] Kernel Offset: 0x27000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Comment 17 Martin Peres 2019-11-19 08:27:26 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/274.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.