Bug 104522 - [drm] GPU HANG: ecode 7:0:0x87f3fffe, reason: Hang on rcs0, action: reset
Summary: [drm] GPU HANG: ecode 7:0:0x87f3fffe, reason: Hang on rcs0, action: reset
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) All
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-07 09:26 UTC by Knut Herbert
Modified: 2018-12-18 06:10 UTC (History)
3 users (show)

See Also:
i915 platform: HSW
i915 features: GPU hang


Attachments
crash dump from /sys/class/drm/card0/error (15.27 KB, text/plain)
2018-01-07 09:27 UTC, Knut Herbert
no flags Details
New crash log on 4.15.4 (7.13 KB, text/plain)
2018-02-26 18:57 UTC, Knut Herbert
no flags Details
Still reproducible on 4.16.5 (6.77 KB, text/plain)
2018-05-03 17:34 UTC, Knut Herbert
no flags Details

Description Knut Herbert 2018-01-07 09:26:51 UTC
Happens after Wakeup from pm-suspend:

[ 9159.109562] [drm] GPU HANG: ecode 7:0:0x87f3fffe, reason: Hang on rcs0, action: reset
[ 9159.109566] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 9159.109567] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 9159.109569] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 9159.109570] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 9159.109572] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 9159.109717] i915 0000:00:02.0: Resetting chip after gpu hang
[ 9169.084229] i915 0000:00:02.0: Resetting chip after gpu hang

Attached crash dump from /sys/class/drm/card0/error.
Comment 1 Knut Herbert 2018-01-07 09:27:28 UTC
Created attachment 136594 [details]
crash dump from /sys/class/drm/card0/error
Comment 2 Knut Herbert 2018-01-07 09:40:22 UTC
[ 9141.547842] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
[ 9141.547860] [drm:cpt_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun
[ 9141.570848] PM: suspend entry (deep)
[ 9141.570849] PM: Syncing filesystems ... done.
[ 9141.573606] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 9141.574743] OOM killer disabled.
[ 9141.574744] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 9141.575864] Suspending console(s) (use no_console_suspend to debug)
[ 9141.576178] sd 4:0:0:0: [sde] Synchronizing SCSI cache
[ 9141.576207] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
[ 9141.576225] sd 4:0:0:0: [sde] Stopping disk
[ 9141.576238] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
[ 9141.576263] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[ 9141.576279] sd 3:0:0:0: [sdd] Stopping disk
[ 9141.576294] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 9141.576306] sd 2:0:0:0: [sdc] Stopping disk
[ 9141.576333] sd 1:0:0:0: [sdb] Stopping disk
[ 9141.576364] sd 0:0:0:0: [sda] Stopping disk
[ 9141.576680] e1000e: EEE TX LPI TIMER: 00000011
[ 9142.081495] Display power well on
[ 9142.081519] ------------[ cut here ]------------
[ 9142.081593] WARNING: CPU: 0 PID: 2860 at /build/linux-ryBv1B/linux-4.14.12/drivers/gpu/drm/i915/intel_display.c:8825 hsw_enable_pc8+0x654/0x6c0 [i915]
[ 9142.081594] Modules linked in: binfmt_misc snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support evdev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul snd_hda_codec_realtek crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf pcspkr i915 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer drm_kms_helper snd sg drm mei_me soundcore i2c_algo_bit mei lpc_ich mfd_core battery video button nfsd auth_rpcgss nfs_acl lockd grace sunrpc loop ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sd_mod ahci
[ 9142.081667]  libahci libata crc32c_intel i2c_i801 scsi_mod ehci_pci ehci_hcd xhci_pci xhci_hcd e1000e ptp pps_core usbcore usb_common fan thermal
[ 9142.081685] CPU: 0 PID: 2860 Comm: kworker/u4:31 Not tainted 4.14.0-3-amd64 #1 Debian 4.14.12-2
[ 9142.081687] Hardware name:                  /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
[ 9142.081695] Workqueue: events_unbound async_run_entry_fn
[ 9142.081698] task: ffff8a2055533180 task.stack: ffffae6702bc8000
[ 9142.081762] RIP: 0010:hsw_enable_pc8+0x654/0x6c0 [i915]
[ 9142.081764] RSP: 0018:ffffae6702bcbda0 EFLAGS: 00010286
[ 9142.081767] RAX: 0000000000000015 RBX: ffff8a2059b58000 RCX: ffffffffaa24d248
[ 9142.081769] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000202
[ 9142.081771] RBP: ffff8a2059b58330 R08: 0000000000000000 R09: 0000000000000015
[ 9142.081773] R10: 0000000000000700 R11: 0000000000000000 R12: ffff8a2059b58340
[ 9142.081774] R13: ffffffffc09aa75d R14: 0000000000000000 R15: ffffffffaa075165
[ 9142.081777] FS:  0000000000000000(0000) GS:ffff8a205fa00000(0000) knlGS:0000000000000000
[ 9142.081779] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9142.081781] CR2: 00007ff33bd235e0 CR3: 000000003e00a006 CR4: 00000000000606f0
[ 9142.081783] Call Trace:
[ 9142.081827]  i915_drm_suspend_late+0x13f/0x150 [i915]
[ 9142.081862]  ? i915_pm_poweroff_late+0x30/0x30 [i915]
[ 9142.081867]  dpm_run_callback+0x4b/0x130
[ 9142.081872]  __device_suspend_late+0x8c/0x160
[ 9142.081876]  async_suspend_late+0x1a/0x90
[ 9142.081881]  async_run_entry_fn+0x33/0x160
[ 9142.081888]  process_one_work+0x185/0x380
[ 9142.081892]  worker_thread+0x2e/0x390
[ 9142.081897]  ? process_one_work+0x380/0x380
[ 9142.081902]  kthread+0x118/0x130
[ 9142.081908]  ? kthread_create_on_node+0x70/0x70
[ 9142.081914]  ret_from_fork+0x1f/0x30
[ 9142.081917] Code: e8 0d eb 98 e8 0f ff e9 9a fb ff ff 48 c7 c7 17 a8 9a c0 e8 fa ea 98 e8 0f ff e9 7a fb ff ff 48 c7 c7 77 a7 9a c0 e8 e7 ea 98 e8 <0f> ff e9 64 fa ff ff 48 c7 c7 bc a7 9a c0 e8 d4 ea 98 e8 0f ff
[ 9142.081973] ---[ end trace 0ce398742da10540 ]---
[ 9142.122490] ACPI: Preparing to enter system sleep state S3
[ 9142.123477] PM: Saving platform NVS memory
[ 9142.123501] Disabling non-boot CPUs ...
[ 9142.139232] smpboot: CPU 1 is now offline
[ 9142.141116] ACPI: Low-level resume complete
[ 9142.141169] PM: Restoring platform NVS memory
[ 9142.143510] Enabling non-boot CPUs ...
[ 9142.143604] x86: Booting SMP configuration:
[ 9142.143605] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 9142.145218]  cache: parent cpu1 should not be sleeping
[ 9142.145351] CPU1 is up
[ 9142.146675] ACPI: Waking up from system sleep state S3
[ 9142.171758] sd 0:0:0:0: [sda] Starting disk
[ 9142.171804] sd 1:0:0:0: [sdb] Starting disk
[ 9142.171833] sd 2:0:0:0: [sdc] Starting disk
[ 9142.171845] sd 3:0:0:0: [sdd] Starting disk
[ 9142.171873] sd 4:0:0:0: [sde] Starting disk
[ 9142.339704] OOM killer enabled.
[ 9142.339706] Restarting tasks ... done.
[ 9142.340402] video LNXVIDEO:00: Restoring backlight state
[ 9142.340406] PM: suspend exit
[ 9142.533891] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9142.534404] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 9142.534406] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 9142.534407] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 9142.535200] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[ 9142.535202] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[ 9142.535203] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[ 9142.535492] ata5.00: configured for UDMA/100
[ 9145.297869] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 9147.579688] ata4: link is slow to respond, please be patient (ready=0)
[ 9147.579710] ata1: link is slow to respond, please be patient (ready=0)
[ 9147.579719] ata2: link is slow to respond, please be patient (ready=0)
[ 9147.587711] ata3: link is slow to respond, please be patient (ready=0)
[ 9150.947771] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9151.063785] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9151.130374] ata3.00: configured for UDMA/133
[ 9151.179755] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9151.248397] ata4.00: configured for UDMA/133
[ 9151.363809] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 9151.371950] ata2.00: configured for UDMA/133
[ 9151.519382] ata1.00: configured for UDMA/133
[ 9159.109562] [drm] GPU HANG: ecode 7:0:0x87f3fffe, reason: Hang on rcs0, action: reset
[ 9159.109566] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 9159.109567] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 9159.109569] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 9159.109570] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 9159.109572] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 9159.109717] i915 0000:00:02.0: Resetting chip after gpu hang
[ 9169.084229] i915 0000:00:02.0: Resetting chip after gpu hang
[ 9169.084362] [drm:i915_reset [i915]] *ERROR* GPU recovery failed
Comment 3 Chris Wilson 2018-01-07 12:08:43 UTC
The ringbuffer was trashed by, I guess, the BIOS. Is this 100% reproducible? Do you know if it recently got worse? Could you also test drm-tip [https://cgit.freedesktop.org/drm-tip] ?
Comment 4 Knut Herbert 2018-01-07 17:19:33 UTC
On next wakeup after pm-suspend the kernel log is a little different:

[28830.469098] PM: suspend entry (deep)
[28830.469100] PM: Syncing filesystems ... done.
[28830.474969] Freezing user space processes ... (elapsed 0.000 seconds) done.
[28830.475916] OOM killer disabled.
[28830.475917] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[28830.477040] Suspending console(s) (use no_console_suspend to debug)
[28830.477369] sd 4:0:0:0: [sde] Synchronizing SCSI cache
[28830.477400] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
[28830.477427] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
[28830.477454] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[28830.477472] sd 3:0:0:0: [sdd] Stopping disk
[28830.477531] sd 1:0:0:0: [sdb] Stopping disk
[28830.477535] sd 2:0:0:0: [sdc] Stopping disk
[28830.477539] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[28830.477613] sd 0:0:0:0: [sda] Stopping disk
[28830.477898] e1000e: EEE TX LPI TIMER: 00000011
[28830.481146] sd 4:0:0:0: [sde] Stopping disk
[28830.988843] Display power well on
[28830.988856] ------------[ cut here ]------------
[28830.988886] WARNING: CPU: 1 PID: 6249 at /build/linux-ryBv1B/linux-4.14.12/drivers/gpu/drm/i915/intel_display.c:8825 hsw_enable_pc8+0x654/0x6c0 [i915]
[28830.988887] Modules linked in: binfmt_misc snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support evdev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul snd_hda_codec_realtek crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf pcspkr i915 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer drm_kms_helper snd sg drm mei_me soundcore i2c_algo_bit mei lpc_ich mfd_core battery video button nfsd auth_rpcgss nfs_acl lockd grace sunrpc loop ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sd_mod ahci
[28830.988914]  libahci libata crc32c_intel i2c_i801 scsi_mod ehci_pci ehci_hcd xhci_pci xhci_hcd e1000e ptp pps_core usbcore usb_common fan thermal
[28830.988920] CPU: 1 PID: 6249 Comm: kworker/u4:24 Tainted: G        W       4.14.0-3-amd64 #1 Debian 4.14.12-2
[28830.988921] Hardware name:                  /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
[28830.988924] Workqueue: events_unbound async_run_entry_fn
[28830.988925] task: ffff8a1ff9493080 task.stack: ffffae6702670000
[28830.988947] RIP: 0010:hsw_enable_pc8+0x654/0x6c0 [i915]
[28830.988948] RSP: 0018:ffffae6702673da0 EFLAGS: 00010286
[28830.988949] RAX: 0000000000000015 RBX: ffff8a2059b58000 RCX: ffffffffaa24d248
[28830.988949] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000202
[28830.988950] RBP: ffff8a2059b58330 R08: 0000000000000000 R09: 0000000000000015
[28830.988950] R10: 0000000000000700 R11: 0000000000000000 R12: ffff8a2059b58340
[28830.988951] R13: ffffffffc09aa75d R14: 0000000000000000 R15: ffffffffaa075165
[28830.988952] FS:  0000000000000000(0000) GS:ffff8a205fb00000(0000) knlGS:0000000000000000
[28830.988952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[28830.988953] CR2: 00007f783d0115e0 CR3: 000000003e00a004 CR4: 00000000000606e0
[28830.988954] Call Trace:
[28830.988969]  i915_drm_suspend_late+0x13f/0x150 [i915]
[28830.988981]  ? i915_pm_poweroff_late+0x30/0x30 [i915]
[28830.988983]  dpm_run_callback+0x4b/0x130
[28830.988985]  __device_suspend_late+0x8c/0x160
[28830.988986]  async_suspend_late+0x1a/0x90
[28830.988988]  async_run_entry_fn+0x33/0x160
[28830.988990]  process_one_work+0x185/0x380
[28830.988992]  worker_thread+0x2e/0x390
[28830.988993]  ? process_one_work+0x380/0x380
[28830.988995]  kthread+0x118/0x130
[28830.988996]  ? kthread_create_on_node+0x70/0x70
[28830.988999]  ret_from_fork+0x1f/0x30
[28830.989000] Code: e8 0d eb 98 e8 0f ff e9 9a fb ff ff 48 c7 c7 17 a8 9a c0 e8 fa ea 98 e8 0f ff e9 7a fb ff ff 48 c7 c7 77 a7 9a c0 e8 e7 ea 98 e8 <0f> ff e9 64 fa ff ff 48 c7 c7 bc a7 9a c0 e8 d4 ea 98 e8 0f ff
[28830.989017] ---[ end trace 0ce398742da10541 ]---
[28831.028237] ACPI: Preparing to enter system sleep state S3
[28831.028742] PM: Saving platform NVS memory
[28831.028756] Disabling non-boot CPUs ...
[28831.045142] smpboot: CPU 1 is now offline
[28831.046687] ACPI: Low-level resume complete
[28831.046742] PM: Restoring platform NVS memory
[28831.049059] Enabling non-boot CPUs ...
[28831.049154] x86: Booting SMP configuration:
[28831.049155] smpboot: Booting Node 0 Processor 1 APIC 0x2
[28831.050780]  cache: parent cpu1 should not be sleeping
[28831.050912] CPU1 is up
[28831.052240] ACPI: Waking up from system sleep state S3
[28831.077209] sd 0:0:0:0: [sda] Starting disk
[28831.077284] sd 1:0:0:0: [sdb] Starting disk
[28831.077315] sd 2:0:0:0: [sdc] Starting disk
[28831.077340] sd 3:0:0:0: [sdd] Starting disk
[28831.077365] sd 4:0:0:0: [sde] Starting disk
[28831.241576] OOM killer enabled.
[28831.241577] Restarting tasks ... done.
[28831.244123] video LNXVIDEO:00: Restoring backlight state
[28831.244147] PM: suspend exit
[28831.436003] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[28831.436538] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[28831.436540] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[28831.436541] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[28831.437345] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[28831.437347] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[28831.437348] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[28831.437811] ata5.00: configured for UDMA/100
[28834.191713] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[28836.477548] ata4: link is slow to respond, please be patient (ready=0)
[28836.477562] ata2: link is slow to respond, please be patient (ready=0)
[28836.481845] ata1: link is slow to respond, please be patient (ready=0)
[28836.485654] ata3: link is slow to respond, please be patient (ready=0)
[28839.665579] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[28839.853948] ata3.00: configured for UDMA/133
[28839.901560] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[28840.094309] ata4.00: configured for UDMA/133
[28840.137563] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[28840.265604] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[28840.336109] ata2.00: configured for UDMA/133
[28840.425308] ata1.00: configured for UDMA/133
Comment 5 Knut Herbert 2018-02-26 18:57:34 UTC
Created attachment 137617 [details]
New crash log on 4.15.4

Still happens on 4.15.4. Sorry can't try drm-tip. It's definitely reproducible.
Comment 6 Jani Saarinen 2018-03-29 07:11:33 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 7 Knut Herbert 2018-03-29 18:32:05 UTC
Bug is still there in 4.15.11:

[45436.112193] systemd[1]: apt-daily.timer: Adding 5h 11min 58.843377s random time.
[45436.707375] systemd[1]: apt-daily.timer: Adding 1h 48min 4.775470s random time.
[45436.782703] systemd[1]: apt-daily.timer: Adding 11h 47min 51.149293s random time.
[45436.862284] systemd[1]: apt-daily.timer: Adding 11h 42min 25.375924s random time.
[45436.921741] systemd: 45 output lines suppressed due to ratelimiting
[45454.838048] device-mapper: uevent: version 1.0.3
[45454.839505] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[45455.933357] SGI XFS with ACLs, security attributes, realtime, no debug enabled
[45455.940026] JFS: nTxBlock = 8192, nTxLock = 65536
[45455.954893] ntfs: driver 2.1.32 [Flags: R/O MODULE].
[45455.968204] QNX4 filesystem 0.2.3 registered.
[45455.999742] Btrfs loaded, crc32c=crc32c-intel
[45456.004883] fuse init (API version 7.26)
[45583.576848] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
[45583.576866] [drm:cpt_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun
[45583.599914] PM: suspend entry (deep)
[45583.599915] PM: Syncing filesystems ... done.
[45583.604312] Freezing user space processes ... (elapsed 0.000 seconds) done.
[45583.605305] OOM killer disabled.
[45583.605305] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[45583.606451] Suspending console(s) (use no_console_suspend to debug)
[45583.607506] e1000e: EEE TX LPI TIMER: 00000011
[45583.621345] sd 4:0:0:0: [sde] Synchronizing SCSI cache
[45583.621367] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[45583.621374] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
[45583.621380] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[45583.621384] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
[45583.621459] sd 0:0:0:0: [sda] Stopping disk
[45583.621473] sd 1:0:0:0: [sdb] Stopping disk
[45583.621473] sd 3:0:0:0: [sdd] Stopping disk
[45583.621477] sd 2:0:0:0: [sdc] Stopping disk
[45583.625060] sd 4:0:0:0: [sde] Stopping disk
[45584.135126] ------------[ cut here ]------------
[45584.135127] Display power well on
[45584.135176] WARNING: CPU: 1 PID: 12794 at /build/linux-jIx23a/linux-4.15.11/drivers/gpu/drm/i915/intel_display.c:8696 hsw_enable_pc8+0x6c9/0x730 [i915]
[45584.135176] Modules linked in: fuse btrfs zstd_decompress zstd_compress xxhash ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs dm_mod cpuid binfmt_misc snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support evdev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore snd_hda_codec_realtek intel_rapl_perf i915 snd_hda_codec_generic drm_kms_helper pcspkr snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm mei_me snd_timer lpc_ich drm mfd_core snd i2c_algo_bit sg soundcore mei video button nfsd auth_rpcgss nfs_acl lockd grace sunrpc loop ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov
[45584.135211]  async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sd_mod crc32c_intel ahci libahci i2c_i801 libata scsi_mod xhci_pci e1000e ehci_pci xhci_hcd ehci_hcd ptp pps_core usbcore usb_common fan thermal
[45584.135223] CPU: 1 PID: 12794 Comm: kworker/u4:26 Not tainted 4.15.0-2-amd64 #1 Debian 4.15.11-1
[45584.135224] Hardware name:                  /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
[45584.135228] Workqueue: events_unbound async_run_entry_fn
[45584.135250] RIP: 0010:hsw_enable_pc8+0x6c9/0x730 [i915]
[45584.135251] RSP: 0018:ffffbcee42877da0 EFLAGS: 00010286
[45584.135252] RAX: 0000000000000000 RBX: ffff98e058680000 RCX: ffffffffa3c4d788
[45584.135253] RDX: ffffffffa3c4d788 RSI: 0000000000000086 RDI: 0000000000000202
[45584.135253] RBP: ffff98e058680358 R08: 0000000000000000 R09: 0000000000000015
[45584.135254] R10: 0000000000000700 R11: 0000000000000000 R12: ffff98e058680368
[45584.135255] R13: ffffffffc08ae9ef R14: 0000000000000000 R15: ffffffffa3a7fde5
[45584.135256] FS:  0000000000000000(0000) GS:ffff98e05fb00000(0000) knlGS:0000000000000000
[45584.135257] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45584.135258] CR2: 00007fed79fe15b0 CR3: 000000008940a001 CR4: 00000000000606e0
[45584.135258] Call Trace:
[45584.135274]  i915_drm_suspend_late+0x100/0x160 [i915]
[45584.135279]  ? pci_pm_poweroff_late+0x30/0x30
[45584.135281]  dpm_run_callback+0x4d/0x130
[45584.135283]  __device_suspend_late+0x8c/0x160
[45584.135285]  async_suspend_late+0x1a/0x90
[45584.135288]  async_run_entry_fn+0x39/0x160
[45584.135291]  process_one_work+0x17b/0x370
[45584.135293]  worker_thread+0x2e/0x390
[45584.135294]  ? process_one_work+0x370/0x370
[45584.135296]  kthread+0x113/0x130
[45584.135297]  ? kthread_create_worker_on_cpu+0x70/0x70
[45584.135300]  ret_from_fork+0x35/0x40
[45584.135302] Code: e8 ad 37 43 e2 0f 0b e9 67 fb ff ff 48 c7 c7 a9 ea 8a c0 e8 9a 37 43 e2 0f 0b e9 47 fb ff ff 48 c7 c7 09 ea 8a c0 e8 87 37 43 e2 <0f> 0b e9 01 fa ff ff 48 c7 c7 4e ea 8a c0 e8 74 37 43 e2 0f 0b
[45584.135323] ---[ end trace 715499fae7217932 ]---
Comment 8 Knut Herbert 2018-03-29 18:37:36 UTC
Maybe related to this?

https://patchwork.kernel.org/patch/10305191/
Comment 9 Jani Saarinen 2018-04-25 11:02:17 UTC
Imre, any advice here?
Comment 10 Knut Herbert 2018-05-03 17:34:34 UTC
Created attachment 139322 [details]
Still reproducible on 4.16.5

Bug is still reproducible on 4.16.5
Comment 11 Jani Saarinen 2018-05-04 12:16:41 UTC
HI,
Are you able to test our latest tip: https://cgit.freedesktop.org/drm-tip?
Or does this help, Chris, Mika?
Comment 12 Francesco Balestrieri 2018-05-14 12:34:06 UTC
Ping reporter, can you try drm-tip?
Comment 13 Knut Herbert 2018-05-14 17:39:13 UTC
(In reply to Francesco Balestrieri from comment #12)
> Ping reporter, can you try drm-tip?

I'm afraid but at the moment I can't try drm-tip. Maybe I can try this out when I have more time.
Comment 14 Knut Herbert 2018-07-12 20:39:14 UTC
Bug still exists on 4.17.6:

[  159.101925] ------------[ cut here ]------------
[  159.101927] Display power well on
[  159.101998] WARNING: CPU: 1 PID: 1425 at /build/linux-fVnMBb/linux-4.17.6/drivers/gpu/drm/i915/intel_display.c:8805 hsw_enable_pc8+0x5f6/0x640 [i915]
[  159.101999] Modules linked in: binfmt_misc snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support evdev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul i915 drm_kms_helper ghash_clmulni_intel intel_cstate snd_hda_codec_realtek intel_uncore intel_rapl_perf snd_hda_codec_generic drm pcspkr i2c_algo_bit snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer mei_me snd soundcore sg mei lpc_ich video button nfsd auth_rpcgss nfs_acl lockd grace sunrpc loop ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sd_mod ahci libahci i2c_i801
[  159.102068]  crc32c_intel libata scsi_mod thermal fan e1000e xhci_pci xhci_hcd ehci_pci ehci_hcd usbcore usb_common
[  159.102082] CPU: 1 PID: 1425 Comm: kworker/u4:30 Not tainted 4.17.0-1-amd64 #1 Debian 4.17.6-1
[  159.102084] Hardware name:  /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
[  159.102090] Workqueue: events_unbound async_run_entry_fn
[  159.102134] RIP: 0010:hsw_enable_pc8+0x5f6/0x640 [i915]
[  159.102136] RSP: 0018:ffffb34482973d88 EFLAGS: 00010286
[  159.102138] RAX: 0000000000000000 RBX: ffff986b97760000 RCX: ffffffff9ac4c988
[  159.102140] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246
[  159.102142] RBP: ffff986b97760358 R08: 0000000000000000 R09: 0000000000000015
[  159.102143] R10: 0000000000000700 R11: 0000000000000000 R12: ffff986b97760368
[  159.102145] R13: ffffffffc09873d6 R14: 0000000000000000 R15: ffffffff9aa9e492
[  159.102147] FS:  0000000000000000(0000) GS:ffff986b9fb00000(0000) knlGS:0000000000000000
[  159.102149] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  159.102151] CR2: 00007fa81a80c490 CR3: 000000009220a006 CR4: 00000000000606e0
[  159.102152] Call Trace:
[  159.102163]  ? pci_pm_poweroff_late+0x30/0x30
[  159.102191]  i915_drm_suspend_late+0xcf/0x130 [i915]
[  159.102197]  dpm_run_callback+0x4d/0x130
[  159.102202]  ? pci_pm_poweroff_late+0x30/0x30
[  159.102205]  __device_suspend_late+0xba/0x150
[  159.102209]  async_suspend_late+0x1a/0x90
[  159.102213]  async_run_entry_fn+0x39/0x160
[  159.102217]  process_one_work+0x17b/0x360
[  159.102221]  worker_thread+0x2e/0x390
[  159.102224]  ? process_one_work+0x360/0x360
[  159.102228]  kthread+0x113/0x130
[  159.102233]  ? kthread_create_worker_on_cpu+0x70/0x70
[  159.102238]  ret_from_fork+0x35/0x40
[  159.102241] Code: ff ff e8 4e f7 35 d9 0f 0b e9 e2 fb ff ff e8 42 f7 35 d9 0f 0b e9 25 fc ff ff e8 36 f7 35 d9 0f 0b e9 fa fa ff ff e8 2a f7 35 d9 <0f> 0b e9 cd fa ff ff e8 1e f7 35 d9 0f 0b e9 67 fb ff ff e8 12
[  159.102292] ---[ end trace c34c2cc43c8e9155 ]---
Comment 15 Knut Herbert 2018-09-08 09:34:23 UTC
Still happens in 4.18.6:

[   33.824181] random: crng init done
[   33.824185] random: 7 urandom warning(s) missed due to ratelimiting
[ 5454.411990] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A
[ 5454.412015] [drm:cpt_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun
[ 5454.446855] PM: suspend entry (deep)
[ 5454.446856] PM: Syncing filesystems ... done.
[ 5454.454295] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 5454.455469] OOM killer disabled.
[ 5454.455469] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 5454.456600] Suspending console(s) (use no_console_suspend to debug)
[ 5454.457626] e1000e: EEE TX LPI TIMER: 00000011
[ 5454.475691] sd 4:0:0:0: [sde] Synchronizing SCSI cache
[ 5454.475737] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[ 5454.475741] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 5454.475760] sd 4:0:0:0: [sde] Stopping disk
[ 5454.475762] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
[ 5454.475769] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
[ 5454.475841] sd 2:0:0:0: [sdc] Stopping disk
[ 5454.475845] sd 3:0:0:0: [sdd] Stopping disk
[ 5454.475848] sd 0:0:0:0: [sda] Stopping disk
[ 5454.479403] sd 1:0:0:0: [sdb] Stopping disk
[ 5454.984440] ------------[ cut here ]------------
[ 5454.984441] Display power well on
[ 5454.984499] WARNING: CPU: 0 PID: 2077 at /build/linux-hJelb7/linux-4.18.6/drivers/gpu/drm/i915/intel_display.c:8932 hsw_enable_pc8+0x5e9/0x630 [i915]
[ 5454.984500] Modules linked in: binfmt_misc iTCO_wdt iTCO_vendor_support evdev intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf snd_hda_codec_realtek pcspkr snd_hda_codec_generic snd_hda_intel snd_hda_codec i915 snd_hda_core sg snd_hwdep snd_pcm snd_timer drm_kms_helper snd soundcore drm mei_me i2c_algo_bit mei lpc_ich video button pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace loop sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sd_mod crc32c_intel
[ 5454.984551]  i2c_i801 fan thermal ahci libahci libata xhci_pci ehci_pci scsi_mod ehci_hcd xhci_hcd e1000e usbcore usb_common
[ 5454.984562] CPU: 0 PID: 2077 Comm: kworker/u4:31 Not tainted 4.18.0-1-amd64 #1 Debian 4.18.6-1
[ 5454.984563] Hardware name:  /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
[ 5454.984568] Workqueue: events_unbound async_run_entry_fn
[ 5454.984602] RIP: 0010:hsw_enable_pc8+0x5e9/0x630 [i915]
[ 5454.984603] Code: ff e8 bb 0c ff cd 0f 0b e9 ef fb ff ff e8 af 0c ff cd 0f 0b e9 32 fc ff ff e8 a3 0c ff cd 0f 0b e9 07 fb ff ff e8 97 0c ff cd <0f> 0b e9 da fa ff ff e8 8b 0c ff cd 0f 0b e9 74 fb ff ff e8 7f 0c
[ 5454.984637] RSP: 0018:ffffa97d0271bd88 EFLAGS: 00010286
[ 5454.984639] RAX: 0000000000000000 RBX: ffff925fd6878000 RCX: 0000000000000006
[ 5454.984640] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff925fdfa16730
[ 5454.984641] RBP: ffff925fd6878350 R08: 0000000000000000 R09: 0000000000000015
[ 5454.984642] R10: 0000000000000700 R11: 0000000000000000 R12: ffff925fd6878360
[ 5454.984643] R13: ffffffffc08fdfe2 R14: 0000000000000002 R15: ffffffff8f6a5802
[ 5454.984645] FS:  0000000000000000(0000) GS:ffff925fdfa00000(0000) knlGS:0000000000000000
[ 5454.984647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5454.984648] CR2: 00007f0268700460 CR3: 000000003c00a004 CR4: 00000000000606f0
[ 5454.984649] Call Trace:
[ 5454.984674]  i915_drm_suspend_late+0xcf/0x130 [i915]
[ 5454.984679]  ? pci_pm_poweroff_late+0x30/0x30
[ 5454.984682]  dpm_run_callback+0x4e/0x160
[ 5454.984685]  ? pci_pm_poweroff_late+0x30/0x30
[ 5454.984688]  __device_suspend_late+0xba/0x150
[ 5454.984691]  async_suspend_late+0x1a/0x90
[ 5454.984694]  async_run_entry_fn+0x39/0x160
[ 5454.984697]  process_one_work+0x195/0x370
[ 5454.984700]  worker_thread+0x30/0x390
[ 5454.984702]  ? process_one_work+0x370/0x370
[ 5454.984704]  kthread+0x113/0x130
[ 5454.984706]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 5454.984710]  ret_from_fork+0x35/0x40
[ 5454.984713] ---[ end trace ec2e273b78cf4b2d ]---
Comment 16 Lakshmi 2018-10-18 15:07:10 UTC
Knut, Sorry for the delay...
Couldn't find gpu hang from dmesg.
I see errors related FIFO underruns in dmesg.

Can you please attach dmesg with kernel parameters drm.debug=0x1e log_buf_len=4M from boot.

Can you please do that from latest drm-tip. This would be really helpful for investigating this issue.
Comment 17 Lakshmi 2018-11-02 10:29:28 UTC
Knut, any updates here?
If there is no GPU hang with recent/latest drm-tip, I would like to close this and not to include FIFO underrun errors in to this bug, as this bug is originally for GPU hang.
Comment 18 Knut Herbert 2018-11-02 11:12:58 UTC
Is there a debian rootfs / livecd I can use to check this. Trying drm-tip on the system is too much effort, don't have the time to try.
Comment 19 Francesco Balestrieri 2018-11-23 11:26:11 UTC
Setting to medium priority until we have more details.

Is the problem still about GPU hangs or something else?
Comment 20 Lakshmi 2018-11-27 07:50:33 UTC
Knut, Any updates here?
If you don't notice ahang with recent kernels, I can close this bug.

Regarding FIFO underrun errors we have few open issues. Before creating a bug ensure there are no open issues.
Comment 21 Lakshmi 2018-12-18 06:09:55 UTC
No feedback from many months, closing as resolved works for me.
Please re-open if issue persists with latest drm-tip https://cgit.freedesktop.org/drm-tip and send dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.