Created attachment 107089 [details] GPU crash dump I upgraded a stock Debian Wheezy system kernel to a vanilla 3.16.3 downloaded from kernel.org and my system was slow to boot. My logs showed a 3 minute hang before printing this message: Sep 29 12:57:16 notleych-linux org.kde.powerdevil.backlighthelper: QDBusConnection: system D-Bus connection created before QCoreApplication. Application may misbehave. Sep 29 12:57:22 notleych-linux kernel: [ 198.748705] [drm] stuck on render ring Sep 29 12:57:22 notleych-linux kernel: [ 198.749260] [drm] GPU HANG: ecode 0:0x85fffff8, in kwin [3405], reason: Ring hung, action: reset Sep 29 12:57:22 notleych-linux kernel: [ 198.749263] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Sep 29 12:57:22 notleych-linux kernel: [ 198.749264] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Sep 29 12:57:22 notleych-linux kernel: [ 198.749265] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Sep 29 12:57:22 notleych-linux kernel: [ 198.749266] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Sep 29 12:57:22 notleych-linux kernel: [ 198.749267] [drm] GPU crash dump saved to /sys/class/drm/card0/error Sep 29 12:57:22 notleych-linux kernel: [ 198.768720] ------------[ cut here ]------------ Sep 29 12:57:22 notleych-linux kernel: [ 198.768733] WARNING: CPU: 1 PID: 3008 at drivers/gpu/drm/drm_irq.c:774 send_vblank_event+0x32/0xce [drm]() Sep 29 12:57:22 notleych-linux kernel: [ 198.768734] Modules linked in: des_generic ecb md4 cifs binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc nls_utf8 nls_cp437 vfat fat loop joydev hid_generic usbhid hid x86_pkg_temp_thermal coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd i915 lrw gf128mul glue_helper drm_kms_helper drm iTCO_wdt iTCO_vendor_support ehci_pci ehci_hcd usbcore acpi_cpufreq evdev lpc_ich psmouse usb_common i2c_i801 mfd_core serio_raw processor i2c_algo_bit microcode i2c_core snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic pcspkr dcdbas snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm video button snd_seq snd_timer snd_seq_device parport_pc parport thermal_sys snd soundcore tpm_tis tpm ext4 crc16 jbd2 mbcache raid1 md_mod sg sd_mod crc_t10dif crct10dif_common sr_mod cdrom crc32c_intel ahci libahci libata scsi_mod e1000e ptp pps_core Sep 29 12:57:22 notleych-linux kernel: [ 198.768775] CPU: 1 PID: 3008 Comm: Xorg Not tainted 3.16.3 #1 Sep 29 12:57:22 notleych-linux kernel: [ 198.768776] Hardware name: Dell Inc. Precision T1600/06NWYK, BIOS A10 02/21/2012 Sep 29 12:57:22 notleych-linux kernel: [ 198.768777] 0000000000000000 0000000000000009 ffffffff8139ae68 0000000000000000 Sep 29 12:57:22 notleych-linux kernel: [ 198.768779] ffffffff8103c193 ffff88030b89f000 ffffffffa032a6f5 ffff88031c85eb70 Sep 29 12:57:22 notleych-linux kernel: [ 198.768781] ffff88002fb0be40 ffff88030bbe7cd8 000000000000027e ffff88031c85e800 Sep 29 12:57:22 notleych-linux kernel: [ 198.768783] Call Trace: Sep 29 12:57:22 notleych-linux kernel: [ 198.768788] [<ffffffff8139ae68>] ? dump_stack+0x41/0x51 Sep 29 12:57:22 notleych-linux kernel: [ 198.768792] [<ffffffff8103c193>] ? warn_slowpath_common+0x78/0x90 Sep 29 12:57:22 notleych-linux kernel: [ 198.768797] [<ffffffffa032a6f5>] ? send_vblank_event+0x32/0xce [drm] Sep 29 12:57:22 notleych-linux kernel: [ 198.768801] [<ffffffffa032a6f5>] ? send_vblank_event+0x32/0xce [drm] Sep 29 12:57:22 notleych-linux kernel: [ 198.768805] [<ffffffffa032aa75>] ? drm_send_vblank_event+0x51/0x5a [drm] Sep 29 12:57:22 notleych-linux kernel: [ 198.768818] [<ffffffffa03a03e4>] ? intel_crtc_page_flip+0x3b2/0x3eb [i915] Sep 29 12:57:22 notleych-linux kernel: [ 198.768824] [<ffffffffa03351bd>] ? drm_mode_page_flip_ioctl+0x1dc/0x27d [drm] Sep 29 12:57:22 notleych-linux kernel: [ 198.768828] [<ffffffffa0327fe0>] ? drm_ioctl+0x27a/0x3c0 [drm] Sep 29 12:57:22 notleych-linux kernel: [ 198.768833] [<ffffffffa0334fe1>] ? drm_mode_gamma_get_ioctl+0xb7/0xb7 [drm] Sep 29 12:57:22 notleych-linux kernel: [ 198.768836] [<ffffffff811223b8>] ? do_vfs_ioctl+0x3ed/0x436 Sep 29 12:57:22 notleych-linux kernel: [ 198.768839] [<ffffffff8111522a>] ? vfs_read+0xb7/0xf7 Sep 29 12:57:22 notleych-linux kernel: [ 198.768841] [<ffffffff8112244a>] ? SyS_ioctl+0x49/0x77 Sep 29 12:57:22 notleych-linux kernel: [ 198.768843] [<ffffffff8139f312>] ? system_call_fastpath+0x16/0x1b Sep 29 12:57:22 notleych-linux kernel: [ 198.768844] ---[ end trace 4e4656dbeea8452e ]---
Should be fixed with commit c4d69da167fa967749aeb70bc0e94a457e5d00c1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 8 14:25:41 2014 +0100 drm/i915: Evict CS TLBs between batches Running igt, I was encountering the invalid TLB bug on my 845g, despite that it was using the CS workaround. Examining the w/a buffer in the error state, showed that the copy from the user batch into the workaround itself was suffering from the invalid TLB bug (the first cacheline was broken with the first two words reversed). Time to try a fresh approach. This extends the workaround to write into each page of our scratch buffer in order to overflow the TLB and evict the invalid entries. This could be refined to only do so after we update the GTT, but for simplicity, we do it before each batch. I suspect this supersedes our current workaround, but for safety keep doing both. v2: The magic number shall be 2. This doesn't conclusively prove that it is the mythical TLB bug we've been trying to workaround for so long, that it requires touching a number of pages to prevent the corruption indicates to me that it is TLB related, but the corruption (the reversed cacheline) is more subtle than a TLB bug, where we would expect it to read the wrong page entirely. Oh well, it prevents a reliable hang for me and so probably for others as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> I believe.
(In reply to comment #1) > Should be fixed with > > commit c4d69da167fa967749aeb70bc0e94a457e5d00c1 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Sep 8 14:25:41 2014 +0100 > > drm/i915: Evict CS TLBs between batches > Ah, replied to the wrong bug. Sorry.
Dear Reporter, This Mesa bug has been in the "NEEDINFO" status for over 60 days. I am closing this bug based on lack of response but feel free to reopen if resolution is still needed. Please ensure you're supplying the correct information as requested. Thank you.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.