Created attachment 65204 [details] 3rd dmesg After some time using X, it freezes. Only mouse works and I need to ssh to restart the system. Checking logs, I can see 200 of the following WARNING messages: [89248.305606] ------------[ cut here ]------------ [89248.305624] WARNING: at drivers/gpu/drm/i915/i915_drv.c:398 gen6_gt_check_fifodbg.isra.3+0x40/0x50 [i915]() [89248.305628] Hardware name: [89248.305630] MMIO read or write has been dropped 3 [89248.305633] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek mei(C) snd_hda_intel snd_hda_codec snd_hwdep snd_pcm i915 snd_page_alloc snd_timer snd video i2c_algo_bit microcode ghash_clmulni_intel iTCO_wdt cryptd soundcore drm_kms_helper coretemp acpi_cpufreq joydev mperf psmouse serio_raw processor pcspkr button evdev iTCO_vendor_support drm shpchp pci_hotplug i2c_i801 i2c_core intel_agp intel_gtt e1000e crc32c_intel ext4 crc16 jbd2 mbcache usbhid hid sd_mod ahci xhci_hcd libahci libata scsi_mod ehci_hcd usbcore usb_common [89248.305694] Pid: 469, comm: X Tainted: G WC 3.4.7-1-ARCH #1 [89248.305698] Call Trace: [89248.305707] [<ffffffff810515bf>] warn_slowpath_common+0x7f/0xc0 [89248.305713] [<ffffffff810516b6>] warn_slowpath_fmt+0x46/0x50 [89248.305723] [<ffffffffa02c0490>] gen6_gt_check_fifodbg.isra.3+0x40/0x50 [i915] [89248.305732] [<ffffffffa02c081e>] __gen6_gt_force_wake_put+0x1e/0x20 [i915] [89248.305742] [<ffffffffa02c0d11>] i915_read32+0x131/0x150 [i915] [89248.305755] [<ffffffffa02ffe90>] intel_ring_get_active_head+0x30/0x40 [i915] [89248.305766] [<ffffffffa02ffee5>] gen6_ring_get_seqno+0x45/0x50 [i915] [89248.305779] [<ffffffffa02d5fba>] i915_gem_throttle_ioctl+0xba/0x240 [i915] [89248.305786] [<ffffffff811818a0>] ? __pollwait+0xf0/0xf0 [89248.305797] [<ffffffffa01cf483>] drm_ioctl+0x4c3/0x570 [drm] [89248.305809] [<ffffffffa02d5f00>] ? i915_gem_busy_ioctl+0x170/0x170 [i915] [89248.305817] [<ffffffff81246864>] ? timerqueue_del+0x34/0x90 [89248.305824] [<ffffffff81076f20>] ? __remove_hrtimer+0x60/0xc0 [89248.305830] [<ffffffff81180ab7>] do_vfs_ioctl+0x97/0x530 [89248.305836] [<ffffffff8105700c>] ? do_setitimer+0x1cc/0x260 [89248.305841] [<ffffffff81180fe9>] sys_ioctl+0x99/0xa0 [89248.305848] [<ffffffff8146aaa9>] system_call_fastpath+0x16/0x1b [89248.305852] ---[ end trace 2e392e332536dc75 ]--- [89248.306807] ------------[ cut here ]------------ and finally, when it freezes, I get: [90387.362163] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [90387.362167] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [90387.365284] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00000000 tail 00000000 start 00000000 [90401.649006] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [90401.649226] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00000000 tail 00000000 start 00000000 [90454.707303] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung There is another bug 53169 that seems to be related, but can't tell. I'm attaching logs from 3 different times that this issue happened.
Created attachment 65206 [details] 3rd i915_error_state
Ok, added logs just for the last occurrence of the issue. Please let me know if you need more details.
Looks to be a similar bug to bug 50545. Does the issue no longer manifest if you disable rc6 by adding i915.i915_enable_rc6=0 to your kernel parameters?
The dmesg for bug 50545 seems different. Anyway, I'll try i915.i915_enable_rc6=0 and I'll let you know the results. Thanks for the fast response.
Hmm, did I quick grep for the rc6+dopped mmio bug... The precise bug I was looking for is no longer in that list (and you are right it is not bug 50545), Daniel is probably hiding it from me again...
The actual bug I intended to reference was bug 50619.
I have added i915.i915_enable_rc6=0 to my kernel parameters and so far it seems the issue is gone: [root@odin ~]# uptime 09:35:55 up 22:08, 2 users, load average: 3,10, 2,63, 2,60 Although I'm running a desktop, I understand this parameter will help to save some power, so I would like to know if/when this issue will be fixed. I'm attaching the current dmesg and this is the current i915_error_state: [root@odin etc]# cat /sys/kernel/debug/dri/0/i915_error_state no error state collected Thanks.
Created attachment 65280 [details] 4th dmesg
It seems that I have talked too early. My box just hung. I'm attaching the logs.
Created attachment 65282 [details] This is my current sysctl
Created attachment 65283 [details] 5th Xorg
Created attachment 65285 [details] 5th dmesg
I've added my sysctl.conf in order to check if there is anything there that could lead to this yet. After I checked my grub setup and I found two options I added some time ago due to an error I was receiving related to mttr: enable_mtrr_cleanup mtrr_spare_reg_nr=1 Should I try to run without it? Xorg version: ============= X.Org X Server 1.12.3 Release Date: 2012-07-09 X Protocol Version 11, Revision 0 Build Operating System: Linux 3.4.4-3-ARCH x86_64 Current Operating System: Linux odin 3.4.7-1-ARCH #1 SMP PREEMPT Sun Jul 29 22:02:56 CEST 2012 x86_64 Kernel command line: root=/dev/disk/by-uuid/8e2f80f0-c4ee-44b2-a446-39f0de4ff9a6 ro vga=773 enable_mtrr_cleanup mtrr_spare_reg_nr=1 Build Date: 09 July 2012 03:59:39PM Current version of pixman: 0.26.2 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. xorg-bdftopcf 1.0.3-2 xorg-font-util 1.3.0-1 xorg-font-utils 7.6-3 xorg-fonts-alias 1.0.2-2 xorg-fonts-encodings 1.0.4-3 xorg-fonts-misc 1.0.1-2 xorg-iceauth 1.0.5-1 xorg-mkfontdir 1.0.7-1 xorg-mkfontscale 1.1.0-1 xorg-server 1.12.3-1 xorg-server-common 1.12.3-1 xorg-server-utils 7.6-3 xorg-sessreg 1.0.7-1 xorg-setxkbmap 1.3.0-1 xorg-utils 7.6-8 xorg-xauth 1.0.7-1 xorg-xbacklight 1.1.2-3 xorg-xcmsdb 1.0.4-1 xorg-xdpyinfo 1.3.0-1 xorg-xdriinfo 1.0.4-3 xorg-xev 1.2.0-1 xorg-xgamma 1.0.5-1 xorg-xhost 1.0.5-1 xorg-xinit 1.3.2-1 xorg-xinput 1.6.0-1 xorg-xkbcomp 1.2.4-1 xorg-xlsatoms 1.1.1-1 xorg-xlsclients 1.1.2-2 xorg-xmessage 1.0.3-2 xorg-xmodmap 1.0.7-1 xorg-xprop 1.2.1-1 xorg-xrandr 1.3.5-1 xorg-xrdb 1.0.9-2 xorg-xrefresh 1.0.4-3 xorg-xset 1.2.2-1 xorg-xsetroot 1.1.0-3 xorg-xvinfo 1.1.1-3 xorg-xwininfo 1.1.2-1 intel-dri 8.0.4-2 libva-driver-intel 1.0.18-1 xf86-video-intel 2.20.2-2 i915_error_state: ================= no error state collected uptime: ======= 10:00:04 up 22:32, 3 users, load average: 1,22, 1,09, 1,77 Please let me know if you need anything else. Thanks again.
Hm, I don't know whether you can change that with systctl, i915_enable_rc6 is a module option ... Maybe double-check in /sys/modules/i915/parameters/i915_enable_rc6 whether it works?
And, guess what? You are right :-S I've changed my grub setup and let's see what will happen.
I forget to add: [root@odin ~]# cat /sys/module/i915/parameters/i915_enable_rc6 -1 This is what was before.
and this is after setting up the kernel: [root@odin ~]# cat /sys/module/i915/parameters/i915_enable_rc6 0
Well that just sunk my best theory!
Chris, I've changed the parameter correctly this time and so far, after 6 hours, there is no single WARNING message in dmesg. So, the boat is still floating.
(In reply to comment #19) > Chris, I've changed the parameter correctly this time and so far, after 6 > hours, there is no single WARNING message in dmesg. > > So, the boat is still floating. Can you please confirm that you do not see any hangs whilst disabling rc6?
With some odd exceptions, the error state seems to indicate the GPU was idle when the hangcheck elapsed. Or is it just me?
Considering that some of the missed writes were to update the ring tail pointer, trying to guess what state the GPU is in seems fraught.
I confirm no errors since the driver parameter change: isaque@odin:~$ uptime 07:52:03 up 4 days, 7:22, 2 users, load average: 0,62, 1,19, 1,57 Now, what I gain/loose having this parameter disabled? Thx.
Created attachment 65494 [details] 6th dmesg Just for your record, in case you want to check anything else.
Sounds like we can safely coalesce this bug reports into the original "rc6 explodes randomly". *** This bug has been marked as a duplicate of bug 50619 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.