I am monitoring HP Laptop via ssh to try to catch a lockup propblem. I am not sure which component to select for the bug reports. Here is output from dmesg. This while running glxgear. It did not lockup yet, but spotted this first. I will post more as I find. Please advise other info needed. [aside: the PCI Bus Error I think is unrelated but included so others can discern] [ 270.207119] pcieport 0000:00:01.7: AER: Multiple Corrected error received: id=0008 [ 270.207136] pcieport 0000:00:01.7: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000f(Transmitter ID) [ 270.207144] pcieport 0000:00:01.7: device [1022:15d3] error status/mask=00001000/00006000 [ 270.207149] pcieport 0000:00:01.7: [12] Replay Timer Timeout [ 397.899405] pcieport 0000:00:01.7: AER: Multiple Corrected error received: id=0008 [ 397.899426] pcieport 0000:00:01.7: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000f(Transmitter ID) [ 397.899434] pcieport 0000:00:01.7: device [1022:15d3] error status/mask=00001000/00006000 [ 397.899439] pcieport 0000:00:01.7: [12] Replay Timer Timeout [ 793.776505] pcieport 0000:00:01.7: AER: Multiple Corrected error received: id=0008 [ 793.776524] pcieport 0000:00:01.7: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000f(Transmitter ID) [ 793.776532] pcieport 0000:00:01.7: device [1022:15d3] error status/mask=00001000/00006000 [ 793.776537] pcieport 0000:00:01.7: [12] Replay Timer Timeout [ 797.012006] nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based firewall rule not found. Use the iptables CT target to attach helpers instead. [ 1079.061454] pcieport 0000:00:01.7: AER: Corrected error received: id=0008 [ 1079.061469] pcieport 0000:00:01.7: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000f(Transmitter ID) [ 1079.061478] pcieport 0000:00:01.7: device [1022:15d3] error status/mask=00001000/00006000 [ 1079.061483] pcieport 0000:00:01.7: [12] Replay Timer Timeout [ 1079.061489] pcieport 0000:00:01.7: AER: Corrected error received: id=0008 [ 1079.061503] pcieport 0000:00:01.7: can't find device of ID0008 [ 1145.211182] pcieport 0000:00:01.7: AER: Corrected error received: id=0008 [ 1145.211196] pcieport 0000:00:01.7: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000f(Transmitter ID) [ 1145.211214] pcieport 0000:00:01.7: device [1022:15d3] error status/mask=00001000/00006000 [ 1145.211220] pcieport 0000:00:01.7: [12] Replay Timer Timeout [ 1145.211229] pcieport 0000:00:01.7: AER: Corrected error received: id=0008 [ 1145.211239] pcieport 0000:00:01.7: can't find device of ID0008 [ 1350.594831] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 1us * 10 tries - optc1_lock line:553 [ 1350.594955] WARNING: CPU: 3 PID: 1828 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:195 generic_reg_wait+0xf3/0x170 [amdgpu] [ 1350.594956] Modules linked in: ccm fuse rfcomm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle cmac iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc vfat fat arc4 r8822be(C) hp_wmi sparse_keymap wmi_bmof edac_mce_amd kvm_amd ccp kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 snd_hda_intel btusb irqbypass btrtl crct10dif_pclmul crc32_pclmul btbcm [ 1350.594989] btintel snd_hda_codec bluetooth hid_sensor_accel_3d hid_sensor_incl_3d hid_sensor_gyro_3d ghash_clmulni_intel uvcvideo hid_sensor_rotation hid_sensor_magn_3d snd_hda_core hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer videobuf2_vmalloc videobuf2_memops kfifo_buf videobuf2_v4l2 snd_hwdep videobuf2_common industrialio snd_seq videodev cfg80211 snd_seq_device ecdh_generic joydev snd_pcm media rtsx_pci_ms memstick rfkill snd_timer snd sp5100_tco soundcore shpchp i2c_piix4 k10temp tpm_crb wmi tpm_tis hp_accel tpm_tis_core lis3lv02d tpm i2c_scmi video hp_wireless input_polldev pinctrl_amd acpi_cpufreq amdkfd hid_sensor_hub amd_iommu_v2 amdgpu hid_logitech_hidpp chash i2c_algo_bit gpu_sched drm_kms_helper ttm rtsx_pci_sdmmc drm mmc_core crc32c_intel nvme serio_raw nvme_core [ 1350.595018] rtsx_pci i2c_hid hid_logitech_dj [ 1350.595023] CPU: 3 PID: 1828 Comm: gnome-shell Tainted: G C 4.16.11-300.fc28.x86_64 #1 [ 1350.595024] Hardware name: HP HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.17 03/29/2018 [ 1350.595064] RIP: 0010:generic_reg_wait+0xf3/0x170 [amdgpu] [ 1350.595065] RSP: 0018:ffffbf7048407948 EFLAGS: 00010297 [ 1350.595066] RAX: 0000000000000229 RBX: 0000000000000001 RCX: 0000000000000000 [ 1350.595067] RDX: 0000000000000000 RSI: ffff9fd2decd6938 RDI: ffff9fd2decd6938 [ 1350.595068] RBP: ffff9fd2cb352a00 R08: 0000000000000005 R09: 000000000000042b [ 1350.595068] R10: 0000000000000001 R11: ffffffff9d9751ed R12: 000000000000000b [ 1350.595069] R13: 000000000000504d R14: 0000000000000100 R15: 0000000000000001 [ 1350.595071] FS: 00007f2b0357f280(0000) GS:ffff9fd2decc0000(0000) knlGS:0000000000000000 [ 1350.595072] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1350.595072] CR2: 000055e154e8f3a8 CR3: 00000001ecaa0000 CR4: 00000000003406e0 [ 1350.595073] Call Trace: [ 1350.595121] optc1_lock+0xa0/0xb0 [amdgpu] [ 1350.595165] dcn10_apply_ctx_for_surface+0xdf/0x13f0 [amdgpu] [ 1350.595173] ? __alloc_pages_nodemask+0x11e/0x2b0 [ 1350.595175] ? free_one_page+0x3d6/0x510 [ 1350.595214] dc_commit_state+0x262/0x560 [amdgpu] [ 1350.595252] ? mod_freesync_set_user_enable+0x11b/0x150 [amdgpu] [ 1350.595295] amdgpu_dm_atomic_commit_tail+0x373/0xd90 [amdgpu] [ 1350.595329] ? amdgpu_bo_pin_restricted+0x1cb/0x2c0 [amdgpu] [ 1350.595333] ? _cond_resched+0x15/0x30 [ 1350.595335] ? wait_for_completion_timeout+0x3a/0x190 [ 1350.595336] ? wait_for_completion_interruptible+0x35/0x1d0 [ 1350.595347] commit_tail+0x3d/0x70 [drm_kms_helper] [ 1350.595354] drm_atomic_helper_commit+0x103/0x110 [drm_kms_helper] [ 1350.595372] drm_atomic_connector_commit_dpms+0xdb/0x100 [drm] [ 1350.595384] drm_mode_obj_set_property_ioctl+0x178/0x280 [drm] [ 1350.595394] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm] [ 1350.595403] drm_ioctl_kernel+0x5b/0xb0 [drm] [ 1350.595413] drm_ioctl+0x1c0/0x380 [drm] [ 1350.595424] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm] [ 1350.595428] ? eventfd_read+0xe6/0x290 [ 1350.595459] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 1350.595463] do_vfs_ioctl+0xa4/0x610 [ 1350.595465] SyS_ioctl+0x74/0x80 [ 1350.595469] do_syscall_64+0x74/0x180 [ 1350.595472] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 1350.595473] Code: d9 48 c7 c2 90 81 82 c0 48 c7 c7 9c fd 82 c0 50 4c 8b 4c 24 58 44 8b 44 24 50 e8 c9 c8 de ff 83 7d 20 01 58 44 8b 54 24 08 74 02 <0f> 0b 48 83 c4 10 44 89 d0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 [ 1350.595495] ---[ end trace 3336f495b7ef729e ]---
Please post some information about your hardware and the kernel version you're running. Full dmesg is also good, shows what the driver says when it's loading.
Created attachment 139819 [details] Full dmesg text dmesg output
(In reply to Ernst Sjöstrand from comment #1) > Please post some information about your hardware and the kernel version > you're running. Full dmesg is also good, shows what the driver says when > it's loading. I have attached the dmesg text. I also tried to use ssh from a remote machine to try to 'see' what is going on. The ssh session also completely locks up. I can reproduce this hange when running glxgears with vblank_mode=0. The time it takes is random, something like 10 to 30 minutes.
Extended renderer info (GLX_MESA_query_renderer): Vendor: X.Org (0x1002) Device: AMD RAVEN (DRM 3.23.0 / 4.16.12-300.fc28.x86_64, LLVM 6.0.0) (0x15dd) Version: 18.0.2 Accelerated: yes Video memory: 223MB Unified memory: no Preferred profile: core (0x1) Max core profile version: 4.5 Max compat profile version: 3.0 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 3.1 Memory info (GL_ATI_meminfo): VBO free memory - total: 223 MB, largest block: 223 MB VBO free aux. memory - total: 3067 MB, largest block: 3067 MB Texture free memory - total: 223 MB, largest block: 223 MB Texture free aux. memory - total: 3067 MB, largest block: 3067 MB Renderbuffer free memory - total: 223 MB, largest block: 223 MB Renderbuffer free aux. memory - total: 3067 MB, largest block: 3067 MB Memory info (GL_NVX_gpu_memory_info): Dedicated video memory: 223 MB Total available memory: 3291 MB Currently available dedicated video memory: 223 MB OpenGL vendor string: X.Org OpenGL renderer string: AMD RAVEN (DRM 3.23.0 / 4.16.12-300.fc28.x86_64, LLVM 6.0.0) OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.0.2 OpenGL core profile shading language version string: 4.50
I have same issues. AMD Ryzen 1800x Sapphier Vega 56 amdgpu git kernel 4.17.0 This is what dmesg say from time to time beyond a hang up. [Sa Jun 9 17:34:54 2018] [drm:generic_reg_wait] *ERROR* REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif line:563 [Sa Jun 9 17:34:54 2018] WARNING: CPU: 14 PID: 175 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:195 generic_reg_wait+0xe2/0x160 [Sa Jun 9 17:34:54 2018] Modules linked in: vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) [Sa Jun 9 17:34:54 2018] CPU: 14 PID: 175 Comm: kworker/14:1 Tainted: G O 4.17.0 #1 [Sa Jun 9 17:34:54 2018] Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018 [Sa Jun 9 17:34:54 2018] Workqueue: events dm_irq_work_func [Sa Jun 9 17:34:54 2018] RIP: 0010:generic_reg_wait+0xe2/0x160 [Sa Jun 9 17:34:54 2018] RSP: 0018:ffffba1941e9fa88 EFLAGS: 00010297 [Sa Jun 9 17:34:54 2018] RAX: 0000000000000000 RBX: 0000000000000dad RCX: 0000000000000000 [Sa Jun 9 17:34:54 2018] RDX: 0000000000000000 RSI: ffff985a9ef953b8 RDI: ffff985a9ef953b8 [Sa Jun 9 17:34:54 2018] RBP: 000000000000000a R08: 0000000000000416 R09: 0000000000000002 [Sa Jun 9 17:34:54 2018] R10: 0000000000000002 R11: 0000000000000001 R12: ffff985a8c8b5280 [Sa Jun 9 17:34:54 2018] R13: 00000000000035af R14: 0000000000000010 R15: 0000000000000001 [Sa Jun 9 17:34:54 2018] FS: 0000000000000000(0000) GS:ffff985a9ef80000(0000) knlGS:0000000000000000 [Sa Jun 9 17:34:54 2018] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Sa Jun 9 17:34:54 2018] CR2: 000055e803d42c78 CR3: 00000003c621c000 CR4: 00000000003406e0 [Sa Jun 9 17:34:54 2018] Call Trace: [Sa Jun 9 17:34:54 2018] dce_mi_free_dmif+0x11c/0x1a0 [Sa Jun 9 17:34:54 2018] dce110_reset_hw_ctx_wrap+0x13b/0x1c0 [Sa Jun 9 17:34:54 2018] dce110_apply_ctx_to_hw+0x51/0x8c0 [Sa Jun 9 17:34:54 2018] ? amdgpu_pm_compute_clocks+0xa2/0x570 [Sa Jun 9 17:34:54 2018] dc_commit_state+0x333/0x5f0 [Sa Jun 9 17:34:54 2018] ? set_freesync_on_streams.part.6+0x48/0x240 [Sa Jun 9 17:34:54 2018] ? mod_freesync_set_user_enable+0x116/0x140 [Sa Jun 9 17:34:54 2018] amdgpu_dm_atomic_commit_tail+0x359/0xd10 [Sa Jun 9 17:34:54 2018] ? amdgpu_bo_pin_restricted+0x227/0x2e0 [Sa Jun 9 17:34:54 2018] ? _cond_resched+0x10/0x40 [Sa Jun 9 17:34:54 2018] ? wait_for_completion_timeout+0x2f/0x130 [Sa Jun 9 17:34:54 2018] ? _cond_resched+0x10/0x40 [Sa Jun 9 17:34:54 2018] ? wait_for_completion_interruptible+0x2c/0x160 [Sa Jun 9 17:34:54 2018] ? dm_plane_helper_prepare_fb+0xea/0x290 [Sa Jun 9 17:34:54 2018] commit_tail+0x38/0x70 [Sa Jun 9 17:34:54 2018] drm_atomic_helper_commit+0x11c/0x130 [Sa Jun 9 17:34:54 2018] dm_restore_drm_connector_state+0x100/0x190 [Sa Jun 9 17:34:54 2018] handle_hpd_irq+0x81/0xa0 [Sa Jun 9 17:34:54 2018] dm_irq_work_func+0x49/0x60 [Sa Jun 9 17:34:54 2018] process_one_work+0x1cc/0x3c0 [Sa Jun 9 17:34:54 2018] worker_thread+0x26/0x3f0 [Sa Jun 9 17:34:54 2018] ? trace_event_raw_event_workqueue_execute_start+0xc0/0xc0 [Sa Jun 9 17:34:54 2018] kthread+0x10e/0x130 [Sa Jun 9 17:34:54 2018] ? kthread_create_worker_on_cpu+0x70/0x70 [Sa Jun 9 17:34:54 2018] ret_from_fork+0x22/0x40 [Sa Jun 9 17:34:54 2018] Code: 24 58 48 8b 4c 24 50 89 ee 8b 54 24 48 48 c7 c7 48 1d 4b 9a 44 89 4c 24 08 e8 6b 70 eb ff 41 83 7c 24 20 01 44 8b 4c 24 08 74 02 <0f> 0b 48 83 c4 10 44 89 c8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f [Sa Jun 9 17:34:54 2018] ---[ end trace b03679a92b01c897 ]--- I can't give logs about hangups because i can't enter the machine via ssh.
I used grubby to add to my kernel boot command 'idle=nomwait' and the problem seems resolved. The mwait instruction is known to possibly hang threads on some earlier released ryzen chips as documented in the AMD Errata.
(In reply to JerryD from comment #6) > I used grubby to add to my kernel boot command 'idle=nomwait' and the > problem seems resolved. The mwait instruction is known to possibly hang > threads on some earlier released ryzen chips as documented in the AMD Errata. Thanks for the follow-up, resolving accordingly.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.