Bug 106942 - X freezes with Ubuntu kernel 4.15.0-23-generic (AMDGPU)
Summary: X freezes with Ubuntu kernel 4.15.0-23-generic (AMDGPU)
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL: https://bugs.launchpad.net/ubuntu/+so...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-17 14:56 UTC by Olaf 'Rhialto' Seibert
Modified: 2018-06-27 21:24 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Olaf 'Rhialto' Seibert 2018-06-17 14:56:22 UTC
This is on my parents machine; I only have remote access and not continuously, so this may slow down providing additional information.

Since the latest kernel upgrade (from
 Linux version 4.13.0-43-generic (buildd@lgw01-amd64-026) (gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2)) #48-Ubuntu SMP Wed May 16 12:18:48 UTC 2018 (Ubuntu 4.13.0-43.48-generic 4.13.16)
to
Linux version 4.15.0-23-generic (buildd@lgw01-amd64-055) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 (Ubuntu 4.15.0-23.25-generic 4.15.18)
the machine appears to freezes some time soon after booting. It is just X; the machine is still reachable via ssh.

I filed this bug [on Ubuntu Launchpad) using apport-cli, running the older (working) kernel, not the newer (failing) one.

In /var/log/kernel I can see the following:

Jun 15 23:26:23 xa-xubu kernel: [ 2417.562386] INFO: task Xorg:757 blocked for more than 120 seconds.
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562396] Not tainted 4.15.0-23-generic #25-Ubuntu
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562403] Xorg D 0 757 724 0x00400004
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562408] Call Trace:
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562424] __schedule+0x297/0x8b0
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562430] ? __kfifo_in+0x37/0x50
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562434] schedule+0x2c/0x80
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562559] amd_sched_entity_push_job+0xad/0xf0 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562565] ? wait_woken+0x80/0x80
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562653] amdgpu_job_submit+0x9f/0xc0 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562723] amdgpu_vm_bo_update_mapping+0x389/0x3f0 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562793] ? amdgpu_vm_it_iter_first+0x40/0x40 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562863] amdgpu_vm_bo_update+0x325/0x5b0 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562930] amdgpu_gem_va_ioctl+0x524/0x540 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.562962] ? drm_gem_handle_create_tail+0x120/0x190 [drm]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563028] ? amdgpu_gem_create_ioctl+0xc1/0x270 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563096] ? amdgpu_gem_metadata_ioctl+0x1c0/0x1c0 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563115] drm_ioctl_kernel+0x5f/0xb0 [drm]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563134] ? drm_ioctl_kernel+0x5f/0xb0 [drm]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563154] drm_ioctl+0x31b/0x3d0 [drm]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563220] ? amdgpu_gem_metadata_ioctl+0x1c0/0x1c0 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563225] ? update_load_avg+0x57f/0x6e0
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563231] ? futex_wake+0x8f/0x180
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563290] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563296] do_vfs_ioctl+0xa8/0x630
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563300] ? __schedule+0x29f/0x8b0
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563304] SyS_ioctl+0x79/0x90
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563309] do_syscall_64+0x73/0x130
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563313] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563317] RIP: 0033:0x7fbd7ddcf5d7
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563319] RSP: 002b:00007fff67e69aa8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563322] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fbd7ddcf5d7
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563324] RDX: 00007fff67e69af0 RSI: 00000000c0286448 RDI: 000000000000000e
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563326] RBP: 00007fff67e69af0 R08: 0000000101440000 R09: 000000000000000a
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563328] R10: 0000000000000039 R11: 0000000000003246 R12: 00000000c0286448
Jun 15 23:26:23 xa-xubu kernel: [ 2417.563330] R13: 000000000000000e R14: 000055e820965f20 R15: 0

which seems to point to some in-kernel AMD GPU driver.
Since the problem seems to disappear when switching back to the previous kernel, I filed this as a kernel bug.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-23-generic 4.15.0-23.25
ProcVersionSignature: Ubuntu 4.13.0-43.48-generic 4.13.16
Uname: Linux 4.13.0-43-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: rhialto 21222 F.... pulseaudio
 /dev/snd/controlC0: rhialto 21222 F.... pulseaudio
Date: Sat Jun 16 16:03:46 2018
InstallationDate: Installed on 2017-10-29 (230 days ago)
InstallationMedia: Xubuntu 17.10 "Artful Aardvark" - Release amd64 (20171017.1)
IwConfig:
 enp1s0 no wireless extensions.

 lo no wireless extensions.
MachineType: LENOVO 90G9001RNY
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-43-generic.efi.signed root=UUID=103b739d-56cf-440b-a2b4-fc955e1a0a41 ro quiet splash vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-4.13.0-43-generic N/A
 linux-backports-modules-4.13.0-43-generic N/A
 linux-firmware 1.173.1
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2018-06-09 (7 days ago)
dmi.bios.date: 12/29/2016
dmi.bios.vendor: LENOVO
dmi.bios.version: O2HKT24A
dmi.board.name: Jadeite CRB
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40700 WIN 3258076524150
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnLENOVO:bvrO2HKT24A:bd12/29/2016:svnLENOVO:pn90G9001RNY:pvrideacentre310S-08ASR:rvnLENOVO:rnJadeiteCRB:rvrSDK0J40700WIN3258076524150:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: ideacentre 310S-08ASR
dmi.product.name: 90G9001RNY
dmi.product.version: ideacentre 310S-08ASR
dmi.sys.vendor: LENOVO

Part from lspci, to show the graphics hardware and iommu:

00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device [1022:1577]
	Subsystem: Lenovo Device [17aa:364f]
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 24
	Capabilities: <access denied>

00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:98e4] (rev c8) (prog-if 00 [VGA controller])
	Subsystem: Lenovo Device [17aa:364f]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 225
	Region 0: Memory at e8000000 (64-bit, prefetchable) [size=128M]
	Region 2: Memory at f0000000 (64-bit, prefetchable) [size=8M]
	Region 4: I/O ports at f000 [size=256]
	Region 5: Memory at feb00000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu


For more detailed info, such as the full output from lspci, see the attachments at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777245
I filed the bug there first (because the bug occurred after a kernel update) but I am echoing it here since this may be a more targeted audience.
Comment 1 Andrey Grodzovsky 2018-06-23 23:58:35 UTC
Seems like your SW queue to insert commands to HW is full and hence Xorg is stuck waiting for available space in the queue for the new command to be inserted. We changed the  architecture off the SW scheduler since then. Are you able to build latest stable kernel from here https://www.kernel.org/ and see if the problem goes away ?
Comment 2 Olaf 'Rhialto' Seibert 2018-06-27 21:18:42 UTC
On the Launchpad ticket I got a reqest to try a pre-built mainline kernel from Ubuntu. So I had this one tested:
Linux xa-xubu 4.18.0-041800rc1-generic #201806162031 SMP Sun Jun 17 00:34:22 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

and I now have the report that it works. Hopefully this kernel is close enough to the one you meant.

So it seems that the bug is indeed fixed by the changes you mention.
Now this should still trickle into the general Ubuntu kernels I suppose.
Comment 3 Andrey Grodzovsky 2018-06-27 21:24:53 UTC
(In reply to Olaf 'Rhialto' Seibert from comment #2)
> On the Launchpad ticket I got a reqest to try a pre-built mainline kernel
> from Ubuntu. So I had this one tested:
> Linux xa-xubu 4.18.0-041800rc1-generic #201806162031 SMP Sun Jun 17 00:34:22
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> and I now have the report that it works. Hopefully this kernel is close
> enough to the one you meant.
> 
> So it seems that the bug is indeed fixed by the changes you mention.
> Now this should still trickle into the general Ubuntu kernels I suppose.

Good to know, yes, this is upstream already and should get to Ubuntu once they update their distro versions. 

I moved the ticket to resolved.

Andrey


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.