Bug 87550 - [gen4] GPU hang in Chrome
Summary: [gen4] GPU hang in Chrome
Status: RESOLVED DUPLICATE of bug 80568
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-21 12:27 UTC by wolf.duttlinger
Modified: 2015-04-30 18:16 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description wolf.duttlinger 2014-12-21 12:27:14 UTC
When I play certain videos in the Chrome browser all a sudden the screen goes blank. Neither Ctrl-Alt-Backspace (twice) nor Ctrl-Alt-Delete work. I have to hard reset.

Here is output from /var/log/message
Dec 21 12:45:38 otherworld kernel: [drm] stuck on render ring
Dec 21 12:45:38 otherworld kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Dec 21 12:45:38 otherworld kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Dec 21 12:45:38 otherworld kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Dec 21 12:45:38 otherworld kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Dec 21 12:45:38 otherworld kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Dec 21 12:45:39 otherworld kernel: [drm:i915_reset] *ERROR* Failed to reset chip: -110
Dec 21 12:45:39 otherworld kernel: [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit banging on pin 2
Dec 21 12:46:00 otherworld bluetoothd[764]: Endpoint unregistered: sender=:1.102 path=/MediaEndpoint/A2DPSource
Dec 21 12:46:00 otherworld bluetoothd[764]: Endpoint unregistered: sender=:1.102 path=/MediaEndpoint/A2DPSink
Dec 21 12:45:59 otherworld kernel: ------------[ cut here ]------------
Dec 21 12:45:59 otherworld kernel: WARNING: CPU: 1 PID: 22057 at drivers/gpu/drm/i915/intel_display.c:928 assert_pll+0x68/0x70 [i915]()
Dec 21 12:45:59 otherworld kernel: PLL state assertion failure (expected on, current off)
Dec 21 12:45:59 otherworld kernel: Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev snd_usb_audio media snd_usbmidi_lib snd_rawmidi snd_seq_device bridge stp llc xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT xt_LOG xt_connlimit xt_realm xt_addrtype ip_set_hash_ip xt_comment xt_recent xt_nat ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_defrag_ipv6 xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables rfcomm ctr ccm af_packet vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep snd_hda_codec_analog snd_hda_codec_generic iTCO_wdt iTCO_vendor_support ppdev joydev coretemp kvm_intel arc4 kvm iwl4965 iwlegacy mac80211 microcode serio_raw btusb bluetooth i2c_i801 cfg80211 lpc_ich snd_hda_intel snd_hda_codec snd_hwdep 6lowpan_iphc snd_pcm e1000e thinkpad_acpi shpchp snd_timer ptp thermal pps_core snd soundcore rfkill battery ac parport_pc parport tpm_tis tpm cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq processor hdaps input_polldev mmc_block sdhci_pci sdhci mmc_core nvram evdev loop ipv6 autofs4 ehci_pci ehci_hcd pcmcia uhci_hcd usbcore ide_pci_generic piix ide_core firewire_ohci yenta_socket firewire_core pcmcia_rsrc pcmcia_core ata_generic pata_acpi crc
Dec 21 12:46:00 otherworld kernel: _itu_t usb_common i915 video button i2c_algo_bit drm_kms_helper drm i2c_core ata_piix
Dec 21 12:45:59 otherworld kernel: CPU: 1 PID: 22057 Comm: X Tainted: G           O 3.14.24-desktop-1.mga4 #1
Dec 21 12:45:59 otherworld kernel: Hardware name: LENOVO 767374G/767374G, BIOS 7NETB9WW (2.19 ) 11/27/2008
Dec 21 12:45:59 otherworld kernel:  0000000000000009 ffff8800ba735958 ffffffff81633d8c ffff8800ba7359a0
Dec 21 12:45:59 otherworld kernel:  ffff8800ba735990 ffffffff81065dad 0000000000000001 ffff8800ba3d2000
Dec 21 12:45:59 otherworld kernel:  ffff880136bcc000 0000000000000000 0000000000000000 ffff8800ba7359f0
Dec 21 12:45:59 otherworld kernel: Call Trace:
Dec 21 12:45:59 otherworld kernel:  [<ffffffff81633d8c>] dump_stack+0x45/0x56
Dec 21 12:45:59 otherworld kernel:  [<ffffffff81065dad>] warn_slowpath_common+0x7d/0xa0
Dec 21 12:45:59 otherworld kernel:  [<ffffffff81065e1c>] warn_slowpath_fmt+0x4c/0x50
Dec 21 12:45:59 otherworld kernel:  [<ffffffffa00ff9ec>] ? gen4_read32+0x3c/0xb0 [i915]
Dec 21 12:45:59 otherworld kernel:  [<ffffffffa00c1748>] assert_pll+0x68/0x70 [i915]
Dec 21 12:45:59 otherworld kernel:  [<ffffffffa00c6cff>] intel_crtc_load_lut+0x16f/0x1d0 [i915]
Dec 21 12:45:59 otherworld kernel:  [<ffffffff81639522>] ? mutex_lock+0x12/0x2f
Dec 21 12:45:59 otherworld kernel:  [<ffffffffa0089383>] drm_fb_helper_setcmap+0x373/0x450 [drm_kms_helper]
Dec 21 12:45:59 otherworld kernel:  [<ffffffff813f05ba>] fb_set_cmap+0x6a/0x140
Dec 21 12:45:59 otherworld kernel:  [<ffffffff813f7e8c>] fbcon_set_palette+0x13c/0x170
Dec 21 12:45:59 otherworld kernel:  [<ffffffff813fb7a2>] fbcon_switch+0x3b2/0x550
Dec 21 12:45:59 otherworld kernel:  [<ffffffff81463dd9>] redraw_screen+0x189/0x240
Dec 21 12:45:59 otherworld kernel:  [<ffffffff813f8b6a>] fbcon_blank+0x20a/0x2d0
Dec 21 12:45:59 otherworld kernel:  [<ffffffff814648f4>] do_unblank_screen+0xb4/0x1e0
Dec 21 12:45:59 otherworld kernel:  [<ffffffff8145b9b3>] vt_ioctl+0x10c3/0x1140
Dec 21 12:45:59 otherworld kernel:  [<ffffffff8155e0bf>] ? sock_destroy_inode+0x2f/0x40
Dec 21 12:45:59 otherworld kernel:  [<ffffffff8144f45d>] tty_ioctl+0x26d/0xb60
Dec 21 12:45:59 otherworld kernel:  [<ffffffff811ab108>] do_vfs_ioctl+0x2d8/0x4b0
Dec 21 12:45:59 otherworld kernel:  [<ffffffff811ab361>] SyS_ioctl+0x81/0xa0
Dec 21 12:45:59 otherworld kernel:  [<ffffffff816438bd>] system_call_fastpath+0x1a/0x1f
Dec 21 12:45:59 otherworld kernel: ---[ end trace 8846edd19d4bc313 ]---
Dec 21 12:45:59 otherworld kernel: ------------[ cut here ]------------
Dec 21 12:45:59 otherworld kernel: WARNING: CPU: 1 PID: 22057 at drivers/gpu/drm/i915/intel_display.c:928 assert_pll+0x68/0x70 [i915]()
Dec 21 12:45:59 otherworld kernel: PLL state assertion failure (expected on, current off)
:
:
:
which is repeated several times ( 5 - 6 times) within the same second.

File /sys/class/drm/card0/error mentioned above is of size 0 after re-boot.
Comment 1 wolf.duttlinger 2015-01-19 10:45:28 UTC
Additional information.

The screen going black repeatedly happens when watching YouTube videos in Chromium. It does NOT happen when watching them in Firefox.

Furthermore - to clarify the bug:

Normal X-Session (KDE)
Chromium
Start Youtube Video
Video start playing, music comes
Screen goes black
Music continues to play
Ctrl-Alt-Backspace twice
Music stops - I assume X-Server restartet
BUT: screen stays black.
Disk activity is there for some time - it LOOK like the system is fully running - simply the screen driver stopped...
I need to Power-off/on

Thanks
Wolf
Comment 2 Matt Turner 2015-03-06 23:26:51 UTC
I suspect this may be another duplicate of the bug 80568, fixed (worked-around) by this commit:

commit c4fd0c9052dd391d6f2e9bb8e6da209dfc7ef35b
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sat Jan 17 23:21:15 2015 -0800

    i965: Work around mysterious Gen4 GPU hangs with minimal state changes.
    
    Gen4 hardware appears to GPU hang frequently when using Chromium, and
    also when running 'glmark2 -b ideas'.  Most of the error states contain
    3DPRIMITIVE commands in quick succession, with very few state packets
    between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER.
    
    I trimmed an apitrace of the glmark2 hang down to two draw calls with a
    glUniformMatrix4fv call between the two.  Either draw by itself works
    fine, but together, they hang the GPU.  Removing the glUniform call
    makes the hangs disappear.  In the hardware state, this translates to
    removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets.
    
    Flushing before emitting CONSTANT_BUFFER packets also appears to make
    the hangs disappear.  I observed a slowdown in glxgears by doing it all
    the time, so I've chosen to only do it when BRW_NEW_BATCH and
    BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or
    already flushed the whole pipeline).
    
    I'd much rather understand the problem, but at this point, I don't see
    how we'd ever be able to track it down further.  We have no real tools,
    and the hardware people moved on years ago.  I've analyzed 20+ error
    states and read every scrap of documentation I could find.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Acked-by: Matt Turner <mattst88@gmail.com>
    Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>

It's in git, and backports are in Mesa 10.4.x for x > 3. Please try upgrading to >10.4.3. If it's resolved by such an upgrade, please mark as a duplicate of bug 80568.
Comment 3 Matt Turner 2015-04-30 18:16:57 UTC
No reply. Marking as duplicate.

*** This bug has been marked as a duplicate of bug 80568 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.