This is on my parents machine; I only have remote access and not continuously, so this may slow down providing additional information. Since the latest kernel upgrade (from Linux version 4.13.0-43-generic (buildd@lgw01-amd64-026) (gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2)) #48-Ubuntu SMP Wed May 16 12:18:48 UTC 2018 (Ubuntu 4.13.0-43.48-generic 4.13.16) to Linux version 4.15.0-23-generic (buildd@lgw01-amd64-055) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 (Ubuntu 4.15.0-23.25-generic 4.15.18) the machine appears to freezes some time soon after booting. It is just X; the machine is still reachable via ssh. I filed this bug [on Ubuntu Launchpad) using apport-cli, running the older (working) kernel, not the newer (failing) one. In /var/log/kernel I can see the following: Jun 15 23:26:23 xa-xubu kernel: [ 2417.562386] INFO: task Xorg:757 blocked for more than 120 seconds. Jun 15 23:26:23 xa-xubu kernel: [ 2417.562396] Not tainted 4.15.0-23-generic #25-Ubuntu Jun 15 23:26:23 xa-xubu kernel: [ 2417.562399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 15 23:26:23 xa-xubu kernel: [ 2417.562403] Xorg D 0 757 724 0x00400004 Jun 15 23:26:23 xa-xubu kernel: [ 2417.562408] Call Trace: Jun 15 23:26:23 xa-xubu kernel: [ 2417.562424] __schedule+0x297/0x8b0 Jun 15 23:26:23 xa-xubu kernel: [ 2417.562430] ? __kfifo_in+0x37/0x50 Jun 15 23:26:23 xa-xubu kernel: [ 2417.562434] schedule+0x2c/0x80 Jun 15 23:26:23 xa-xubu kernel: [ 2417.562559] amd_sched_entity_push_job+0xad/0xf0 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.562565] ? wait_woken+0x80/0x80 Jun 15 23:26:23 xa-xubu kernel: [ 2417.562653] amdgpu_job_submit+0x9f/0xc0 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.562723] amdgpu_vm_bo_update_mapping+0x389/0x3f0 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.562793] ? amdgpu_vm_it_iter_first+0x40/0x40 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.562863] amdgpu_vm_bo_update+0x325/0x5b0 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.562930] amdgpu_gem_va_ioctl+0x524/0x540 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.562962] ? drm_gem_handle_create_tail+0x120/0x190 [drm] Jun 15 23:26:23 xa-xubu kernel: [ 2417.563028] ? amdgpu_gem_create_ioctl+0xc1/0x270 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.563096] ? amdgpu_gem_metadata_ioctl+0x1c0/0x1c0 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.563115] drm_ioctl_kernel+0x5f/0xb0 [drm] Jun 15 23:26:23 xa-xubu kernel: [ 2417.563134] ? drm_ioctl_kernel+0x5f/0xb0 [drm] Jun 15 23:26:23 xa-xubu kernel: [ 2417.563154] drm_ioctl+0x31b/0x3d0 [drm] Jun 15 23:26:23 xa-xubu kernel: [ 2417.563220] ? amdgpu_gem_metadata_ioctl+0x1c0/0x1c0 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.563225] ? update_load_avg+0x57f/0x6e0 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563231] ? futex_wake+0x8f/0x180 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563290] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu] Jun 15 23:26:23 xa-xubu kernel: [ 2417.563296] do_vfs_ioctl+0xa8/0x630 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563300] ? __schedule+0x29f/0x8b0 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563304] SyS_ioctl+0x79/0x90 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563309] do_syscall_64+0x73/0x130 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563313] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563317] RIP: 0033:0x7fbd7ddcf5d7 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563319] RSP: 002b:00007fff67e69aa8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563322] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fbd7ddcf5d7 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563324] RDX: 00007fff67e69af0 RSI: 00000000c0286448 RDI: 000000000000000e Jun 15 23:26:23 xa-xubu kernel: [ 2417.563326] RBP: 00007fff67e69af0 R08: 0000000101440000 R09: 000000000000000a Jun 15 23:26:23 xa-xubu kernel: [ 2417.563328] R10: 0000000000000039 R11: 0000000000003246 R12: 00000000c0286448 Jun 15 23:26:23 xa-xubu kernel: [ 2417.563330] R13: 000000000000000e R14: 000055e820965f20 R15: 0 which seems to point to some in-kernel AMD GPU driver. Since the problem seems to disappear when switching back to the previous kernel, I filed this as a kernel bug. ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-23-generic 4.15.0-23.25 ProcVersionSignature: Ubuntu 4.13.0-43.48-generic 4.13.16 Uname: Linux 4.13.0-43-generic x86_64 ApportVersion: 2.20.9-0ubuntu7.2 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: rhialto 21222 F.... pulseaudio /dev/snd/controlC0: rhialto 21222 F.... pulseaudio Date: Sat Jun 16 16:03:46 2018 InstallationDate: Installed on 2017-10-29 (230 days ago) InstallationMedia: Xubuntu 17.10 "Artful Aardvark" - Release amd64 (20171017.1) IwConfig: enp1s0 no wireless extensions. lo no wireless extensions. MachineType: LENOVO 90G9001RNY ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-43-generic.efi.signed root=UUID=103b739d-56cf-440b-a2b4-fc955e1a0a41 ro quiet splash vt.handoff=1 RelatedPackageVersions: linux-restricted-modules-4.13.0-43-generic N/A linux-backports-modules-4.13.0-43-generic N/A linux-firmware 1.173.1 RfKill: 0: hci0: Bluetooth Soft blocked: no Hard blocked: no SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2018-06-09 (7 days ago) dmi.bios.date: 12/29/2016 dmi.bios.vendor: LENOVO dmi.bios.version: O2HKT24A dmi.board.name: Jadeite CRB dmi.board.vendor: LENOVO dmi.board.version: SDK0J40700 WIN 3258076524150 dmi.chassis.type: 3 dmi.chassis.vendor: Default string dmi.chassis.version: Default string dmi.modalias: dmi:bvnLENOVO:bvrO2HKT24A:bd12/29/2016:svnLENOVO:pn90G9001RNY:pvrideacentre310S-08ASR:rvnLENOVO:rnJadeiteCRB:rvrSDK0J40700WIN3258076524150:cvnDefaultstring:ct3:cvrDefaultstring: dmi.product.family: ideacentre 310S-08ASR dmi.product.name: 90G9001RNY dmi.product.version: ideacentre 310S-08ASR dmi.sys.vendor: LENOVO Part from lspci, to show the graphics hardware and iommu: 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device [1022:1577] Subsystem: Lenovo Device [17aa:364f] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 24 Capabilities: <access denied> 00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:98e4] (rev c8) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device [17aa:364f] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 225 Region 0: Memory at e8000000 (64-bit, prefetchable) [size=128M] Region 2: Memory at f0000000 (64-bit, prefetchable) [size=8M] Region 4: I/O ports at f000 [size=256] Region 5: Memory at feb00000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: amdgpu Kernel modules: amdgpu For more detailed info, such as the full output from lspci, see the attachments at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777245 I filed the bug there first (because the bug occurred after a kernel update) but I am echoing it here since this may be a more targeted audience.
Seems like your SW queue to insert commands to HW is full and hence Xorg is stuck waiting for available space in the queue for the new command to be inserted. We changed the architecture off the SW scheduler since then. Are you able to build latest stable kernel from here https://www.kernel.org/ and see if the problem goes away ?
On the Launchpad ticket I got a reqest to try a pre-built mainline kernel from Ubuntu. So I had this one tested: Linux xa-xubu 4.18.0-041800rc1-generic #201806162031 SMP Sun Jun 17 00:34:22 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux and I now have the report that it works. Hopefully this kernel is close enough to the one you meant. So it seems that the bug is indeed fixed by the changes you mention. Now this should still trickle into the general Ubuntu kernels I suppose.
(In reply to Olaf 'Rhialto' Seibert from comment #2) > On the Launchpad ticket I got a reqest to try a pre-built mainline kernel > from Ubuntu. So I had this one tested: > Linux xa-xubu 4.18.0-041800rc1-generic #201806162031 SMP Sun Jun 17 00:34:22 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > and I now have the report that it works. Hopefully this kernel is close > enough to the one you meant. > > So it seems that the bug is indeed fixed by the changes you mention. > Now this should still trickle into the general Ubuntu kernels I suppose. Good to know, yes, this is upstream already and should get to Ubuntu once they update their distro versions. I moved the ticket to resolved. Andrey
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.