Created attachment 66646 [details] dmesg output While running X, if the CPU gets a little bit of load (such are running a kernel build), I get hangs and sometimes lockups. The dmesg has multiple errors of the type: [drm:__gen6_gt_force_wake_mt_get] *ERROR* Force wake wait timed out [drm:__gen6_gt_wait_for_thread_c0] *ERROR* GT thread status wait timed out Followed eventually by a stack trace from generated by WARNING: at drivers/gpu/drm/i915/intel_pm.c:4009 This is with kernel 3.6.0-rc4, but I see similar errors with 3.5.2 and 3.5.3 (Fedora's packaged version in these instances). The distro used is Fedora 17. Motherboard is MSI Z77A-G43 with an i7 CPU. I can reproduce it every time by starting X11 and putting a little load on the CPU -- I've done it with both a kernel build and with sys_basher -f. Attached is dmesg output with drm.debug=0x06
Your machine is very unhappy. Can you try http://cgit.freedesktop.org/~danvet/drm-intel and the drm-intel-next-queued branch for some more forcewake tweaks?
Created attachment 66701 [details] dmesg output from drm-intel-next-queued (8c3f929)
I'm testing kernel 3.6.0-rc4 currently on a ThinkPad t430s and getting the same message during bootup, but no "hangs".
Created attachment 66844 [details] dmesg output from archlinux with kernel 3.6.0-rc4
Comment on attachment 66844 [details] dmesg output from archlinux with kernel 3.6.0-rc4 earlier kernels are not affected
Has any tried drm-intel-next-queued with the new forcewake tweaks?
(In reply to comment #6) > Has any tried drm-intel-next-queued with the new forcewake tweaks? Yes, in fact I uploaded the dmesg output from that a couple days ago.
I must have read something wrong though. dinq (In reply to comment #7) > (In reply to comment #6) > > Has any tried drm-intel-next-queued with the new forcewake tweaks? > > Yes, in fact I uploaded the dmesg output from that a couple days ago. I must be parsing something wrong in the output then. Sep 5 20:44:47 ciclamino kernel: [ 237.470324] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request. Sep 5 20:44:47 ciclamino kernel: [ 237.472317] [drm:__gen6_gt_wait_for_thread_c0] *ERROR* GT thread status wait timed out Sep 5 20:44:49 ciclamino kernel: [ 239.356918] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request. Sep 5 20:44:49 ciclamino kernel: [ 239.358911] [drm:__gen6_gt_wait_for_thread_c0] *ERROR* GT thread status wait timed out Sep 5 20:44:50 ciclamino kernel: [ 240.577420] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request. Sep 5 20:44:50 ciclamino kernel: [ 240.579414] [drm:__gen6_gt_wait_for_thread_c0] *ERROR* GT thread status wait timed out As you see, we have 3 forcewake timeouts, and 3 thread_c0 timeouts. That should add up to 7.5ms of waiting in dinq. Yet somehow we're done in 3 (the old timeout value). Can you double check that the kernel you are running has https://patchwork.kernel.org/patch/1396301/ in it?
(In reply to comment #8) > > As you see, we have 3 forcewake timeouts, and 3 thread_c0 timeouts. That should > add up to 7.5ms of waiting in dinq. Yet somehow we're done in 3 (the old > timeout value). Can you double check that the kernel you are running has > https://patchwork.kernel.org/patch/1396301/ in it? > The HEAD of my tree for that kernel was 8c3f92, which does include this patch in it. I'll double check to make sure I didn't boot into something else (it'll be few hours before I can get to the machine). From the dmesg saying 3.6.0-rc3+ I'm pretty confident that's right since I've got no other messing around with 3.6-rc3. Also, would it help to include my .config?
Ok, I've confirmed the dmesg that I posted on 2012-09-06 is indeed from a kernel built with commit 8c3f929 as the HEAD. In case it's relevant, I'll upload the .config for that too.
Created attachment 66970 [details] config used for kernel commit 8c3f929
Chris, Daniel: do you have any idea why the timeouts are seeming to be less than we expect?
Since jiffies might lapse just on the next instruction, I guess we might need to add +1 there. And wait_for_atomic_us is a bit bongo-hits since we'll wait for quite a while there ... Maybe this also explains the occasional spurious timeout for forcewake at boot-up.
(In reply to comment #12) > Chris, Daniel: do you have any idea why the timeouts are seeming to be less > than we expect? The timeout values are 2ms. Look: Sep 5 20:44:47 ciclamino kernel: [ 237.470324] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to ack request. Sep 5 20:44:47 ciclamino kernel: [ 237.472317] [drm:__gen6_gt_wait_for_thread_c0] *ERROR* GT thread status wait timed out The only operation between those two messages is the __gen6_gt_wait_for_thread_c0, and from the timing we can see that we spun for 2ms waiting for c0. Unless you are complaining that we only spun for 1.997ms (give or take a couple of ns to advance to the function call and variability in print the message). (In reply to comment #13) > Maybe this also explains the occasional spurious timeout for forcewake at > boot-up. On IVB? We have the issue of waiting for the old ack to clear when probing MT, the warning is bogus afaict.
(In reply to comment #14) > (In reply to comment #12) > > Chris, Daniel: do you have any idea why the timeouts are seeming to be less > > than we expect? > > The timeout values are 2ms. Look: > > Sep 5 20:44:47 ciclamino kernel: [ 237.470324] > [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake to > ack request. > Sep 5 20:44:47 ciclamino kernel: [ 237.472317] > [drm:__gen6_gt_wait_for_thread_c0] *ERROR* GT thread status wait timed out > > The only operation between those two messages is the > __gen6_gt_wait_for_thread_c0, and from the timing we can see that we spun for > 2ms waiting for c0. Unless you are complaining that we only spun for 1.997ms > (give or take a couple of ns to advance to the function call and variability in > print the message). No, just me being stupid again. I was misinterpreting the timestamp format because I'd been looking at some BIOS timestamps which printed MS as the base. > > > (In reply to comment #13) > > Maybe this also explains the occasional spurious timeout for forcewake at > > boot-up. > > On IVB? We have the issue of waiting for the old ack to clear when probing MT, > the warning is bogus afaict. Are you talking about the very first one? Anything but the very first one shouldn't be bogus... Even the first one last time I looked shouldn't have been bogus either. I guess to be totally correct we should set forcewake_count=1 up until we enable rc6 (and leave it as 1 always when rc6 is disabled).
Looking at this again, this really does look like the System Agent dies (with the GT no longer coming out of rc6) and then once our fifo fills up we drop all subsequent writes. Bill, can you try appending i915.i915_enable_rc6=0 to your boot parameters? That should prevent the hangs, and then experiment with i915.i915_enable_rc6=1 and i915.i915_enable_rc6=2?
I don't have a machine built with this same motherboard any more so I don't have the test platform.
I guess we'll have to close this then as unreproducible due to lack of testplatform. Thanks anyway for reporting this issue.
Have hardware where the problem occurs (ThinkPad T430, Intel HD4000 graphics, i5-3210). I do not get hangs, but the message is displayed every time after I boot the system (using grub2-efi). This is the message: [drm:__gen6_gt_force_wake_mt_get] *ERROR* Force wake wait timed out. Not sure if pertinent, but when using UEFI shell instead of efibootmgr entry to launch grub.efi, the problem disappears.
(In reply to comment #19) > Have hardware where the problem occurs (ThinkPad T430, Intel HD4000 > graphics, i5-3210). > > I do not get hangs, but the message is displayed every time after I boot the > system (using grub2-efi). > > This is the message: > > [drm:__gen6_gt_force_wake_mt_get] *ERROR* Force wake wait timed out. > > Not sure if pertinent, but when using UEFI shell instead of efibootmgr entry > to launch grub.efi, the problem disappears. Once? That is not the issue (and is harmless), the problem occurs when you see these after boot/resume.
Ok, closing again since the drive-by-REOPEN by btby@gmx.us is a different (and harmless) issue.
I can follow up on this one. My system either gets stuck on boot or will come up and will die shortly after first CPU load. syslog is filled with high volume of drm errors, cf. below. System is Ivy i5-3570K. I am using branch drm-intel-testing at 4d8bd4f511f00cd7396736e3ebb9c1ffd939d341 RC6 turned off: i915.i915_enable_rc6=0 I did not see any improvement with drm-intel-next-queued. On successful boot prior to death I get Nov 15 08:06:02 i3v kernel: [ 42.933743] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:03 i3v kernel: [ 43.360168] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:03 i3v kernel: [ 43.515674] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:03 i3v kernel: [ 43.884449] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:04 i3v kernel: [ 44.316207] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:04 i3v kernel: [ 44.885198] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:04 i3v kernel: [ 45.016668] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:05 i3v kernel: [ 45.216785] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:05 i3v kernel: [ 45.435612] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:07 i3v kernel: [ 47.887441] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:07 i3v kernel: [ 47.918632] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:07 i3v kernel: [ 47.925687] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:10 i3v kernel: [ 50.886023] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:10 i3v kernel: [ 50.909453] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 15 08:06:10 i3v kernel: [ 50.920799] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear.
Please try the experimental patch at https://patchwork.kernel.org/patch/1751371/
The patch did not change the situation. I addition, here are some interesting lines from the syslog: Nov 16 17:12:17 i3v kernel: [ 4.335928] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 16 17:12:17 i3v kernel: [ 4.335945] ------------[ cut here ]------------ Nov 16 17:12:17 i3v kernel: [ 4.335962] WARNING: at drivers/gpu/drm/i915/intel_pm.c:4140 gen6_gt_check_fifodbg+0x41/0x60 [i915]() Nov 16 17:12:17 i3v kernel: [ 4.335963] Hardware name: Z68MA-D2H-B3 Nov 16 17:12:17 i3v kernel: [ 4.335963] MMIO read or write has been dropped 3 Nov 16 17:12:17 i3v kernel: [ 4.335964] Modules linked in: dvb_core(+) snd_hda_codec cxd2099(C) snd_hwdep i915(+) drm_kms_helper drm snd_pcm kvm_intel snd_seq_midi snd_rawmidi kvm snd_seq_midi_event snd_seq hid_generic usbhid ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts snd_timer snd_seq_device gf128mul snd hid soundcore i2c_algo_bit snd_page_alloc microcode mxm_wmi serio_raw mei wmi mac_hid sbs sbshc it87 hwmon_vid video lpc_ich coretemp lp parport r8169 Nov 16 17:12:17 i3v kernel: [ 4.335992] Pid: 501, comm: modprobe Tainted: G C 3.7.0-rc4+ #2 Nov 16 17:12:17 i3v kernel: [ 4.335993] Call Trace: Nov 16 17:12:17 i3v kernel: [ 4.335999] [<ffffffff81056fef>] warn_slowpath_common+0x7f/0xc0 Nov 16 17:12:17 i3v kernel: [ 4.336001] [<ffffffff810570e6>] warn_slowpath_fmt+0x46/0x50 Nov 16 17:12:17 i3v kernel: [ 4.336004] [<ffffffff8167ae39>] ? _raw_spin_unlock_irqrestore+0x19/0x30 Nov 16 17:12:17 i3v kernel: [ 4.336025] [<ffffffffa02709b1>] gen6_gt_check_fifodbg+0x41/0x60 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336033] [<ffffffffa0270a0d>] __gen6_gt_force_wake_mt_put+0x1d/0x20 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336039] [<ffffffffa0270a74>] gen6_gt_force_wake_put+0x44/0x60 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336047] [<ffffffffa02755cd>] init_ring_common+0x21d/0x3a0 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336054] [<ffffffffa0275785>] init_render_ring+0x35/0x2f0 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336060] [<ffffffffa0275faa>] intel_init_ring_buffer+0x24a/0x350 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336067] [<ffffffffa027769d>] intel_init_render_ring_buffer+0x13d/0x3b0 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336076] [<ffffffffa023d8d7>] i915_gem_init_hw+0x57/0x1a0 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336083] [<ffffffffa023daae>] i915_gem_init+0x8e/0x140 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336091] [<ffffffffa022a4fe>] i915_driver_load+0xdce/0xf00 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336099] [<ffffffffa03b4c91>] drm_get_pci_dev+0x191/0x2b0 [drm] Nov 16 17:12:17 i3v kernel: [ 4.336109] [<ffffffffa0280546>] i915_pci_probe+0x7a/0x82 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336111] [<ffffffff8134b759>] local_pci_probe+0x79/0x100 Nov 16 17:12:17 i3v kernel: [ 4.336113] [<ffffffff8134ce59>] pci_device_probe+0x109/0x130 Nov 16 17:12:17 i3v kernel: [ 4.336116] [<ffffffff8141459b>] driver_probe_device+0x7b/0x240 Nov 16 17:12:17 i3v kernel: [ 4.336118] [<ffffffff8141480b>] __driver_attach+0xab/0xb0 Nov 16 17:12:17 i3v kernel: [ 4.336120] [<ffffffff81414760>] ? driver_probe_device+0x240/0x240 Nov 16 17:12:17 i3v kernel: [ 4.336121] [<ffffffff814129b6>] bus_for_each_dev+0x56/0x90 Nov 16 17:12:17 i3v kernel: [ 4.336123] [<ffffffff814140ce>] driver_attach+0x1e/0x20 Nov 16 17:12:17 i3v kernel: [ 4.336125] [<ffffffff81413c40>] bus_add_driver+0x190/0x290 Nov 16 17:12:17 i3v kernel: [ 4.336126] [<ffffffff81414d6a>] driver_register+0x7a/0x160 Nov 16 17:12:17 i3v kernel: [ 4.336128] [<ffffffff8134bdb9>] __pci_register_driver+0x49/0x50 Nov 16 17:12:17 i3v kernel: [ 4.336133] [<ffffffffa03b4eca>] drm_pci_init+0x11a/0x130 [drm] Nov 16 17:12:17 i3v kernel: [ 4.336135] [<ffffffffa015d000>] ? 0xffffffffa015cfff Nov 16 17:12:17 i3v kernel: [ 4.336142] [<ffffffffa015d066>] i915_init+0x66/0x68 [i915] Nov 16 17:12:17 i3v kernel: [ 4.336145] [<ffffffff8100207f>] do_one_initcall+0x3f/0x170 Nov 16 17:12:17 i3v kernel: [ 4.336148] [<ffffffff810bbc1e>] sys_init_module+0xbe/0x220 Nov 16 17:12:17 i3v kernel: [ 4.336150] [<ffffffff81683699>] system_call_fastpath+0x16/0x1b Nov 16 17:12:17 i3v kernel: [ 4.336151] ---[ end trace db45fed16f075059 ]--- Nov 16 17:12:17 i3v kernel: [ 4.343920] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 16 17:12:17 i3v kernel: [ 4.350441] type=1400 audit(1353082332.193:8): apparmor="STATUS" operation="profile_load" name="/usr/sbin/ntpd" pid=743 comm="apparmor_parser" Nov 16 17:12:17 i3v kernel: [ 4.351912] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 16 17:12:17 i3v kernel: [ 4.359903] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear. Nov 16 17:12:17 i3v kernel: [ 4.367896] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear.
For testing purposes I have swapped CPUs, out i5-3570K, in i3-2105, here the same kernel runs 'fine' (*). So it is closely related to Ivy, and it is not so bad in Sandy. (*) Even on Sandy I do have a hung GPU message and I do see a white stripe picture on startup, which did not appear in older releases.
(In reply to comment #21) > Ok, closing again since the drive-by-REOPEN by btby@gmx.us is a different > (and harmless) issue. Okay, this seem to be printed on every t430s ThinkPad. If that is harmless, the message shouldn't print *ERROR*. Because an *ERROR* is not harmless and pancis the user (of courese). Note: The message changed a little bit with kernel 3.7-rc7 "[ 0.574350] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear." Also dimming backlight doesn't work with 3.7 anymore, but this shouldn't be releated.
(In reply to comment #26) > (In reply to comment #21) > > Ok, closing again since the drive-by-REOPEN by btby@gmx.us is a different > > (and harmless) issue. > > Okay, this seem to be printed on every t430s ThinkPad. If that is harmless, > the message shouldn't print *ERROR*. Because an *ERROR* is not harmless and > pancis the user (of courese). And we have a patch to fix things (it's still an error in our code, but one we can fix up afterwards - the real fix prevents the actual error). tf5a@arcor.de can you please open a new bug for your issue, with the i915_error_state attached when your machine hangs? It's rather likely that you have a different issue (just because it's the same log message doesn't mean it's the same bug), and conflating different bugs in the same report is a good recipe for disaster. Since the issue of the original can still not be reproduced due to lack of the exact machine, closing again. For everyone else: If your name is not "Bill Pemberton" or you don't have the exact same machine as he had, please don't reopen this bug report, but instead file your own new bug. I know that this against all common advice for not filing duplicates, but gfx bugs are different (and much harder to triage correctly than your average bug report). Thanks.
Patch citation for reference: commit 16995a9fe140802c026c2ce17bf7e232f86d57ab Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Oct 18 11:46:10 2012 +0100 drm/i915: Clear FORCEWAKE when taking over from BIOS
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.