Bug 89179 - [gen4 drm] stuck on render ring - PLL state assertion failure (3.13)
Summary: [gen4 drm] stuck on render ring - PLL state assertion failure (3.13)
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-17 01:43 UTC by Maruthi Seshidhar
Modified: 2017-07-24 22:48 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
syslog file (341.32 KB, text/plain)
2015-02-17 02:02 UTC, Maruthi Seshidhar
no flags Details

Description Maruthi Seshidhar 2015-02-17 01:43:08 UTC
On ubuntu 14.04, whenever I open my chrome browser (sometimes firefox also)
and play an youtube video, I see a system hang.

Hard reset is the only way out. After reboot, checked the /var/log/syslog,
It has the following error & stack.

Feb 17 06:48:30 coromandel dbus[701]: [system] Successfully activated service 'org.freedesktop.UDisks2'
Feb 17 06:48:30 coromandel udisksd[2484]: Acquired the name org.freedesktop.UDisks2 on the system message bus
Feb 17 06:48:37 coromandel NetworkManager[911]: <info> (eth1): IP6 addrconf timed out or failed.
Feb 17 06:48:37 coromandel NetworkManager[911]: <info> Activation (eth1) Stage 4 of 5 (IPv6 Configure Timeout) scheduled...
Feb 17 06:48:37 coromandel NetworkManager[911]: <info> Activation (eth1) Stage 4 of 5 (IPv6 Configure Timeout) started...
Feb 17 06:48:37 coromandel NetworkManager[911]: <info> Activation (eth1) Stage 4 of 5 (IPv6 Configure Timeout) complete.
Feb 17 06:48:47 coromandel kernel: [   43.387766] audit_printk_skb: 186 callbacks suppressed
Feb 17 06:48:47 coromandel kernel: [   43.387771] type=1400 audit(1424135927.026:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/cups/backend/cups-pdf" pid=2686 comm="apparmor_parser"
Feb 17 06:48:47 coromandel kernel: [   43.387781] type=1400 audit(1424135927.026:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2686 comm="apparmor_parser"
Feb 17 06:48:47 coromandel kernel: [   43.388505] type=1400 audit(1424135927.030:76): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2686 comm="apparmor_parser"
Feb 17 06:51:29 coromandel kernel: [  205.804150] [drm] stuck on render ring
Feb 17 06:51:29 coromandel kernel: [  205.804163] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb 17 06:51:29 coromandel kernel: [  205.804166] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 17 06:51:29 coromandel kernel: [  205.804171] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 17 06:51:29 coromandel kernel: [  205.804179] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 17 06:51:29 coromandel kernel: [  205.804181] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb 17 06:51:29 coromandel kernel: [  205.805191] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xfeff000 ctx 0) at 0xff009b4
Feb 17 06:51:29 coromandel kernel: [  205.860019] [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit banging on pin 2
Feb 17 06:51:29 coromandel kernel: [  206.316019] [drm:i915_reset] *ERROR* Failed to reset chip.
Feb 17 06:51:30 coromandel gnome-session[2103]: WARNING: App 'compiz.desktop' exited with code 1
Feb 17 06:51:30 coromandel gnome-session[2103]: WARNING: App 'compiz.desktop' respawning too quickly
Feb 17 06:51:30 coromandel gnome-session[2103]: CRITICAL: We failed, but the fail whale is dead. Sorry....
Feb 17 06:51:30 coromandel colord: device removed: xrandr-ViewSonic Corporation-VG2427WM-R9S100800922
Feb 17 06:51:31 coromandel gnome-session[2103]: WARNING: App 'compiz.desktop' respawning too quickly
Feb 17 06:51:31 coromandel gnome-session[2103]: WARNING: App 'compiz.desktop' exited with code 1
Feb 17 06:51:31 coromandel gnome-session[2103]: WARNING: App 'compiz.desktop' respawning too quickly
Feb 17 06:51:40 coromandel kernel: [  217.116245] ------------[ cut here ]------------
Feb 17 06:51:40 coromandel kernel: [  217.116287] WARNING: CPU: 1 PID: 1330 at /build/buildd/linux-3.13.0/drivers/gpu/drm/i915/intel_display.c:922 assert_pll+0x73/0x80 [i915]()
Feb 17 06:51:40 coromandel kernel: [  217.116288] PLL state assertion failure (expected on, current off)
Feb 17 06:51:40 coromandel kernel: [  217.116321] Modules linked in: pci_stub vboxpci(OX) vboxnetadp(OX) vboxnetflt(OX) vboxdrv(OX) rfcomm bnep bluetooth binfmt_misc ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 uvcvideo ipt_REJECT xt_LOG videobuf2_vmalloc videobuf2_memops xt_limit xt_tcpudp videobuf2_core xt_addrtype videodev nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack gpio_ich ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables coretemp kvm_intel snd_hda_codec_idt kvm snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event serio_raw snd_rawmidi lpc_ich snd_seq snd_seq_device snd_timer snd i915 mac_hid parport_pc video ppdev drm_kms_helper soundcore drm lp i2c_algo_bit parport e100 e1000 mii psmouse
Feb 17 06:51:40 coromandel kernel: [  217.116327] CPU: 1 PID: 1330 Comm: Xorg Tainted: G           OX 3.13.0-43-generic #72-Ubuntu
Feb 17 06:51:40 coromandel kernel: [  217.116328] Hardware name:                  /D946GZIS, BIOS TS94610J.86A.0025.2006.0703.1026 07/03/2006
Feb 17 06:51:40 coromandel kernel: [  217.116334]  00000000 00000000 ec4b7af4 c1655602 ec4b7b34 ec4b7b24 c105699e f926e2d0
Feb 17 06:51:40 coromandel kernel: [  217.116338]  ec4b7b50 00000532 f926cc24 0000039a f9219fa3 f9219fa3 00000001 00000001
Feb 17 06:51:40 coromandel kernel: [  217.116342]  00000000 ec4b7b3c c10569f3 00000009 ec4b7b34 f926e2d0 ec4b7b50 ec4b7b60
Feb 17 06:51:40 coromandel kernel: [  217.116343] Call Trace:
Feb 17 06:51:40 coromandel kernel: [  217.116351]  [<c1655602>] dump_stack+0x41/0x52
Feb 17 06:51:40 coromandel kernel: [  217.116355]  [<c105699e>] warn_slowpath_common+0x7e/0xa0
Feb 17 06:51:40 coromandel kernel: [  217.116380]  [<f9219fa3>] ? assert_pll+0x73/0x80 [i915]
Feb 17 06:51:40 coromandel kernel: [  217.116402]  [<f9219fa3>] ? assert_pll+0x73/0x80 [i915]
Feb 17 06:51:40 coromandel kernel: [  217.116405]  [<c10569f3>] warn_slowpath_fmt+0x33/0x40
Feb 17 06:51:40 coromandel kernel: [  217.116428]  [<f9219fa3>] assert_pll+0x73/0x80 [i915]
Feb 17 06:51:40 coromandel kernel: [  217.116453]  [<f921f66a>] intel_crtc_load_lut+0x19a/0x1b0 [i915]
Feb 17 06:51:40 coromandel kernel: [  217.116457]  [<c12fd2ea>] ? snprintf+0x1a/0x20
Feb 17 06:51:40 coromandel kernel: [  217.116464]  [<f8440594>] drm_fb_helper_setcmap+0x244/0x3a0 [drm_kms_helper]
Feb 17 06:51:40 coromandel kernel: [  217.116488]  [<f9225259>] ? intel_crtc_set_config+0x609/0x920 [i915]
Feb 17 06:51:40 coromandel kernel: [  217.116509]  [<f8a8f1f9>] ? drm_framebuffer_unreference+0x39/0x60 [drm]
Feb 17 06:51:40 coromandel kernel: [  217.116513]  [<c1357047>] fb_set_cmap+0x57/0x120
Feb 17 06:51:40 coromandel kernel: [  217.116519]  [<f8440215>] ? drm_fb_helper_pan_display+0xf5/0x110 [drm_kms_helper]
Feb 17 06:51:40 coromandel kernel: [  217.116522]  [<c1352d1f>] ? fb_pan_display+0xbf/0x170
Feb 17 06:51:40 coromandel kernel: [  217.116525]  [<c135dd0c>] fbcon_set_palette+0x12c/0x160
Feb 17 06:51:40 coromandel kernel: [  217.116527]  [<c1360172>] fbcon_switch+0x3b2/0x550
Feb 17 06:51:40 coromandel kernel: [  217.116532]  [<c13dd60e>] redraw_screen+0x14e/0x1f0
Feb 17 06:51:40 coromandel kernel: [  217.116534]  [<c135ebc5>] fbcon_blank+0x225/0x2e0
Feb 17 06:51:40 coromandel kernel: [  217.116538]  [<c13ddfb6>] do_unblank_screen+0xa6/0x1c0
Feb 17 06:51:40 coromandel kernel: [  217.116542]  [<c13d5251>] complete_change_console+0x51/0xe0
Feb 17 06:51:40 coromandel kernel: [  217.116544]  [<c13d628f>] vt_ioctl+0xfaf/0x1080
Feb 17 06:51:40 coromandel kernel: [  217.116550]  [<c13d52e0>] ? complete_change_console+0xe0/0xe0
Feb 17 06:51:40 coromandel kernel: [  217.116553]  [<c13cb273>] tty_ioctl+0x233/0xa20
Feb 17 06:51:40 coromandel kernel: [  217.116556]  [<c118dc8e>] ? dput+0x1e/0x150
Feb 17 06:51:40 coromandel kernel: [  217.116560]  [<c11b3ec3>] ? fsnotify_put_event+0x53/0x90
Feb 17 06:51:40 coromandel kernel: [  217.116562]  [<c11b3b68>] ? fsnotify+0x208/0x2d0
Feb 17 06:51:40 coromandel kernel: [  217.116564]  [<c13cb040>] ? no_tty+0x30/0x30
Feb 17 06:51:40 coromandel kernel: [  217.116567]  [<c118acb2>] do_vfs_ioctl+0x2e2/0x4d0
Feb 17 06:51:40 coromandel kernel: [  217.116570]  [<c117beb1>] ? __sb_end_write+0x31/0x70
Feb 17 06:51:40 coromandel kernel: [  217.116572]  [<c117a405>] ? vfs_write+0x165/0x1b0
Feb 17 06:51:40 coromandel kernel: [  217.116575]  [<c118af00>] SyS_ioctl+0x60/0x80
Feb 17 06:51:40 coromandel kernel: [  217.116578]  [<c166398d>] sysenter_do_call+0x12/0x12
Feb 17 06:51:40 coromandel kernel: [  217.116579] ---[ end trace 91edc47b53e5bead ]---
Feb 17 06:51:40 coromandel kernel: [  217.116648] ------------[ cut here ]------------
Feb 17 06:51:40 coromandel kernel: [  217.116670] WARNING: CPU: 1 PID: 1330 at /build/buildd/linux-3.13.0/drivers/gpu/drm/i915/intel_display.c:922 assert_pll+0x73/0x80 [i915]()
Feb 17 06:51:40 coromandel kernel: [  217.116670] PLL state assertion failure (expected on, current off)
Feb 17 06:51:40 coromandel kernel: [  217.116701] Modules linked in: pci_stub vboxpci(OX) vboxnetadp(OX) vboxnetflt(OX) vboxdrv(OX) rfcomm bnep bluetooth binfmt_misc ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 uvcvideo ipt_REJECT xt_LOG videobuf2_vmalloc videobuf2_memops xt_limit xt_tcpudp videobuf2_core xt_addrtype videodev nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack gpio_ich ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables coretemp kvm_intel snd_hda_codec_idt kvm snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event serio_raw snd_rawmidi lpc_ich snd_seq snd_seq_device snd_timer snd i915 mac_hid parport_pc video ppdev drm_kms_helper soundcore drm lp i2c_algo_bit parport e100 e1000 mii psmouse
Feb 17 06:51:40 coromandel kernel: [  217.116705] CPU: 1 PID: 1330 Comm: Xorg Tainted: G        W  OX 3.13.0-43-generic #72-Ubuntu
Comment 1 Maruthi Seshidhar 2015-02-17 02:02:31 UTC
Created attachment 113546 [details]
syslog file

Attached the complete syslog file.
Comment 2 Jani Nikula 2015-02-17 07:55:59 UTC
Please reproduce and attach resulting /sys/class/drm/card0/error
Comment 3 Maruthi Seshidhar 2015-02-17 11:29:14 UTC
Comment on attachment 113546 [details]
syslog file

The sysfs file /sys/class/drm/card0/error is an incore file.
Once the assertion is hit, the system hangs, 
and I dont get chance to collect the /sys/class/drm/card0/error file.
After reboot, I just see the dumpstack in syslog.

Any pointers on how to collect the said file?
Comment 4 Maruthi Seshidhar 2015-02-17 11:31:21 UTC
The sysfs file /sys/class/drm/card0/error is an empty file after reboot.

Any pointers on how to collect the said file?
Comment 5 Paulo Zanoni 2015-02-23 19:32:49 UTC
(In reply to Maruthi Seshidhar from comment #4)
> The sysfs file /sys/class/drm/card0/error is an empty file after reboot.

Yes, that's expected.

> 
> Any pointers on how to collect the said file?

After the error happens, can you use another machine to SSH to the bugged machine and collect the file?

You could also use some crazy strategies, like run a script that tries to copy /sys/class/drm/card0/error to ~/error-$(date "+%H-%M").txt every minute, and then when the hang happens you wait at least a minute, reboot the machine and check the most recent non-empty files :)
Comment 6 Ander Conselvan de Oliveira 2015-06-03 16:31:34 UTC
There were a few bug reports about issues with gen4 and chrome which were fixed by Mesa 10.5. Since there is no error log attached and no activity in this bug for a while, I'm presuming this is one of those issues.

If you still see this with an updated Mesa, please reopen this bug. In that case update to a recent kernel (with working reset and GPU recovery) and your system will survive the GPU hang and you'll be able to capture the error log.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.