Bug 44992 - [SNB] RC6 hang-ups
Summary: [SNB] RC6 hang-ups
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Daniel Vetter
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-20 06:43 UTC by CC
Modified: 2017-07-24 23:03 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
i915_error_state with kernel 3.1.6 (1.97 MB, application/octet-stream)
2012-01-20 06:44 UTC, CC
no flags Details
i915_error_state with kernel 3.2.1 (1.97 MB, application/octet-stream)
2012-01-20 06:44 UTC, CC
no flags Details
i915_error_state with drm-intel-fixes (2.05 MB, application/octet-stream)
2012-01-21 15:25 UTC, CC
no flags Details

Description CC 2012-01-20 06:43:14 UTC
When RC6 is enabled, the system freezes within a few minutes after boot. The screen blanks during these hang-ups. Most often, the system returns to a usable state within a few seconds.

Virtualization is disabled (and VT-d unavailable on the 2500K).

Tested kernels: 3.1.9 and 3.2.1


dmesg reports errors like these:

[   48.900000] WARNING: at drivers/gpu/drm/i915/i915_drv.c:387 __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]()
[   48.900002] Hardware name: To Be Filled By O.E.M.
[   48.900002] Modules linked in: ipv6 fuse ext2 snd_hda_codec_hdmi snd_hda_codec_realtek mei(C) joydev r8169 shpchp pci_hotplug usbhid hid snd_hda_intel iTCO_wdt mii iTCO_vendor_support i2c_i801 snd_hda_codec processor snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc psmouse serio_raw pcspkr evdev ext4 mbcache jbd2 crc16 xhci_hcd ehci_hcd usbcore i915 drm_kms_helper drm intel_agp i2c_algo_bit button intel_gtt i2c_core video sd_mod ahci libahci libata scsi_mod
[   48.900019] Pid: 623, comm: Xorg Tainted: G        WC  3.1.9-2-ARCH #1
[   48.900020] Call Trace:
[   48.900023]  [<ffffffff81061bef>] warn_slowpath_common+0x7f/0xc0
[   48.900025]  [<ffffffff81061c4a>] warn_slowpath_null+0x1a/0x20
[   48.900028]  [<ffffffffa00e0764>] __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]
[   48.900032]  [<ffffffffa015d2d5>] ring_write_tail+0x65/0x120 [i915]
[   48.900036]  [<ffffffffa01619bc>] render_ring_flush+0xbc/0xe0 [i915]
[   48.900040]  [<ffffffffa010b803>] i915_gem_flush_ring+0x43/0x250 [i915]
[   48.900044]  [<ffffffffa0112b50>] i915_gem_do_execbuffer.isra.7+0x1020/0x16d0 [i915]
[   48.900048]  [<ffffffffa01136bb>] i915_gem_execbuffer2+0x8b/0x240 [i915]
[   48.900051]  [<ffffffffa0098434>] drm_ioctl+0x3e4/0x4c0 [drm]
[   48.900053]  [<ffffffff810746cb>] ? recalc_sigpending+0x1b/0x50
[   48.900057]  [<ffffffffa0113630>] ? i915_gem_execbuffer+0x430/0x430 [i915]
[   48.900059]  [<ffffffff8101e9b1>] ? fpu_finit+0x21/0x40
[   48.900061]  [<ffffffff8116fddf>] do_vfs_ioctl+0x8f/0x500
[   48.900063]  [<ffffffff81014beb>] ? sys_rt_sigreturn+0x1eb/0x200
[   48.900064]  [<ffffffff811702e1>] sys_ioctl+0x91/0xa0
[   48.900066]  [<ffffffff8140c3c2>] system_call_fastpath+0x16/0x1b
[   48.900067] ---[ end trace 9a23b8b32b16a424 ]---

and these

[   53.163526] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[   53.165046] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[   53.177356] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1593 at 1592, next 1594)
[   53.181979] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[   53.185522] [drm:init_ring_common] *ERROR* gen6 bsd ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[   53.188558] [drm:init_ring_common] *ERROR* blt ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[   55.330146] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[   55.332202] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1594 at 1591, next 1595)
[   55.333258] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[   55.333260] [drm:i915_reset] *ERROR* Failed to reset chip.
Comment 1 CC 2012-01-20 06:44:10 UTC
Created attachment 55833 [details]
i915_error_state with kernel 3.1.6
Comment 2 CC 2012-01-20 06:44:45 UTC
Created attachment 55834 [details]
i915_error_state with kernel 3.2.1
Comment 3 CC 2012-01-21 15:25:36 UTC
Created attachment 55937 [details]
i915_error_state with drm-intel-fixes
Comment 4 CC 2012-01-21 15:33:15 UTC
I've tried the latest drm-intel-fixes branch.

At first, it seemed that it fixes the problem. However, the hang-ups occured only after a while and then in quicker succession (a few minutes apart). 

I tried booting a few times and notices that the time they occur first varies. I cross-checked that this is true for the other kernels, too. They don't hang so often in succession, however.


dmesg looks like this:

[  197.970046] WARNING: at drivers/gpu/drm/i915/i915_drv.c:413 __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]()
[  197.970048] Hardware name: To Be Filled By O.E.M.
[  197.970049] Modules linked in: ipv6 fuse ext2 snd_hda_codec_realtek joydev snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd psmouse soundcore shpchp pci_hotplug mei(C) iTCO_wdt iTCO_vendor_support usbhid hid r8169 snd_page_alloc mii evdev i2c_i801 serio_raw pcspkr processor ext4 mbcache jbd2 crc16 xhci_hcd ehci_hcd usbcore usb_common i915 drm_kms_helper drm intel_agp i2c_algo_bit button intel_gtt i2c_core video sd_mod ahci libahci libata scsi_mod
[  197.970070] Pid: 66, comm: kworker/u:5 Tainted: G        WC   3.2.0-drm-intel-fixes-07294-g8f0fc97 #1
[  197.970071] Call Trace:
[  197.970076]  [<ffffffff8105015f>] warn_slowpath_common+0x7f/0xc0
[  197.970078]  [<ffffffff810501ba>] warn_slowpath_null+0x1a/0x20
[  197.970082]  [<ffffffffa00e68d4>] __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]
[  197.970086]  [<ffffffffa00e70f8>] i915_write32+0x58/0x110 [i915]
[  197.970091]  [<ffffffffa0124838>] init_ring_common+0x38/0x310 [i915]
[  197.970095]  [<ffffffffa00f7f58>] ? i915_gem_reset+0xe8/0x210 [i915]
[  197.970099]  [<ffffffffa00f724f>] ? i915_gem_reset_fences+0x6f/0xc0 [i915]
[  197.970102]  [<ffffffff8123b1a0>] ? add_uevent_var+0x100/0x100
[  197.970106]  [<ffffffffa0125374>] init_render_ring+0x34/0x250 [i915]
[  197.970110]  [<ffffffffa00e7579>] i915_reset+0x3c9/0x580 [i915]
[  197.970114]  [<ffffffffa00ebc70>] ? i915_driver_irq_postinstall+0x190/0x190 [i915]
[  197.970117]  [<ffffffffa00ebd38>] i915_error_work_func+0xc8/0x110 [i915]
[  197.970121]  [<ffffffff8106cb76>] process_one_work+0x116/0x4d0
[  197.970123]  [<ffffffff8106d50e>] worker_thread+0x15e/0x350
[  197.970125]  [<ffffffff8106d3b0>] ? manage_workers.isra.29+0x230/0x230
[  197.970127]  [<ffffffff81072623>] kthread+0x93/0xa0
[  197.970130]  [<ffffffff814310e4>] kernel_thread_helper+0x4/0x10
[  197.970132]  [<ffffffff81072590>] ? kthread_freezable_should_stop+0x70/0x70
[  197.970134]  [<ffffffff814310e0>] ? gs_change+0x13/0x13
[  197.970135] ---[ end trace dd0298c7596a1528 ]---
[  197.973174] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[  197.976717] [drm:init_ring_common] *ERROR* gen6 bsd ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[  197.979756] [drm:init_ring_common] *ERROR* blt ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[  401.808581] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  401.810641] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 45985 at 45984, next 45986)
[  401.815809] ------------[ cut here ]------------
[  401.815819] WARNING: at drivers/gpu/drm/i915/i915_drv.c:413 __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]()
[  401.815820] Hardware name: To Be Filled By O.E.M.
[  401.815821] Modules linked in: ipv6 fuse ext2 snd_hda_codec_realtek joydev snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd psmouse soundcore shpchp pci_hotplug mei(C) iTCO_wdt iTCO_vendor_support usbhid hid r8169 snd_page_alloc mii evdev i2c_i801 serio_raw pcspkr processor ext4 mbcache jbd2 crc16 xhci_hcd ehci_hcd usbcore usb_common i915 drm_kms_helper drm intel_agp i2c_algo_bit button intel_gtt i2c_core video sd_mod ahci libahci libata scsi_mod
[  401.815842] Pid: 66, comm: kworker/u:5 Tainted: G        WC   3.2.0-drm-intel-fixes-07294-g8f0fc97 #1
[  401.815843] Call Trace:
[  401.815849]  [<ffffffff8105015f>] warn_slowpath_common+0x7f/0xc0
[  401.815851]  [<ffffffff810501ba>] warn_slowpath_null+0x1a/0x20
[  401.815855]  [<ffffffffa00e68d4>] __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]
[  401.815859]  [<ffffffffa00e70f8>] i915_write32+0x58/0x110 [i915]
[  401.815864]  [<ffffffffa0124838>] init_ring_common+0x38/0x310 [i915]
[  401.815868]  [<ffffffffa00f7f58>] ? i915_gem_reset+0xe8/0x210 [i915]
[  401.815872]  [<ffffffffa00f724f>] ? i915_gem_reset_fences+0x6f/0xc0 [i915]
[  401.815875]  [<ffffffff8123b1a0>] ? add_uevent_var+0x100/0x100
[  401.815879]  [<ffffffffa0125374>] init_render_ring+0x34/0x250 [i915]
[  401.815882]  [<ffffffffa00e7579>] i915_reset+0x3c9/0x580 [i915]
[  401.815886]  [<ffffffffa00ebc70>] ? i915_driver_irq_postinstall+0x190/0x190 [i915]
[  401.815890]  [<ffffffffa00ebd38>] i915_error_work_func+0xc8/0x110 [i915]
[  401.815893]  [<ffffffff8106cb76>] process_one_work+0x116/0x4d0
[  401.815895]  [<ffffffff8106d50e>] worker_thread+0x15e/0x350
[  401.815897]  [<ffffffff8106d3b0>] ? manage_workers.isra.29+0x230/0x230
[  401.815900]  [<ffffffff81072623>] kthread+0x93/0xa0
[  401.815903]  [<ffffffff814310e4>] kernel_thread_helper+0x4/0x10
[  401.815905]  [<ffffffff81072590>] ? kthread_freezable_should_stop+0x70/0x70
[  401.815907]  [<ffffffff814310e0>] ? gs_change+0x13/0x13
[  401.815908] ---[ end trace dd0298c7596a1529 ]---
[  401.818947] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[  401.822490] [drm:init_ring_common] *ERROR* gen6 bsd ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
[  401.825527] [drm:init_ring_common] *ERROR* blt ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000


Whenever it doesn't hang immediately after booting up, dmesg shows a similar error like the above except for the drm:init_ring_common part. I attached i915_error_state with this kernel.
Comment 5 Daniel Vetter 2012-01-21 17:05:22 UTC
So even with the forcewake locking fixes, your gpu just disappears. And then we hit the wait_for_fifo WARN before the gpu is declared dead ...
Comment 6 Ben Widawsky 2012-01-21 19:32:15 UTC
(In reply to comment #5)
> So even with the forcewake locking fixes, your gpu just disappears. And then we
> hit the wait_for_fifo WARN before the gpu is declared dead ...

I was considering an RFC patch to just enable forcewake always, and diable the wait_for_fifo WARN_ON. This is an issue on the simulator. That would be one thing to try.

The thing to try in the meantime is reproduce with forcewaked app in intel-gpu-tools.

By Monday, I'll try to update my patch to get us a bit more info.
Comment 7 CC 2012-01-24 05:44:32 UTC
I am unable to reproduce the bug with forcewaked running in the background.
Comment 8 CC 2012-01-24 08:01:08 UTC
I updated the BIOS, with one of the messages being "Modify default setting for iGPU voltage." Indeed, they changed the default setting from "auto" to "fixed 1.25V".

After trying to trigger the issue with both settings, it does seem to happen only for "auto" mode and not for "fixed 1.25V". Maybe the mainboard tries to be smart and cuts off the GPU when it thinks it's not in use. Is that reasonable?
Comment 9 Eugeni Dodonov 2012-01-24 09:22:23 UTC
Could you specify what is the computer vendor and model please?
Comment 10 CC 2012-01-24 09:30:03 UTC
Sure: the mainboard is the AsRock Z68 Pro3-M.
Comment 11 CC 2012-01-25 09:49:26 UTC
I guess the issue can be closed since it didn't happen again.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.