I'm trying to get page flipping working on all my Intel hardware, and I'm hitting a hang on my 965GME hardware (AVALUE EMX 965GME motherboard, Core 2 Duo T7250 CPU). Unlike the 945 hang (being chased in bug 28788), it's not immediate and easy to reproduce - it takes around 2 days of continuous uptime (60Hz display), and doesn't tickle the hangcheck timer, so intel_error_decode doesn't find anything. With the aid of "echo t > /proc/sysrq-trigger", I've been able to determine that X is stalled in the kernel in i915_gem_wait_for_pending_flip I'm using: * Fedora kernel 2.6.34-45.fc14.i686.PAE (i686 architecture) * xf86-video-intel as of git 28c0ca676c47e7e38fabdd9ef24a70bd26701f33 * xserver as of git 3b3c77b87070ddcdbb2acb114a81628485e7a129 * mesa as of git 7a9246c5d72290ed8455a426801b85b54374e102 * libdrm as of git 726210f87d558d558022f35bc8c839e798a19f0c The trace from the kernel is: Xorg S 00006585 0 1403 1 0x00400000 f405ddc0 00203086 04182167 00006585 c0a4fd40 c0a4fd40 c0a4fd40 c0a4fd40 f401e8ac c0a4fd40 c0a4fd40 000280aa 00000000 f42b8c00 00006585 f401e600 00000000 f401e600 f405de20 f60b10a4 f405de40 f8067133 0000002e 80000000 Call Trace: [<f8067133>] i915_gem_do_execbuffer+0x378/0xbf8 [i915] [<f8062c62>] ? list_move_tail+0x18/0x1b [i915] [<c04c8f56>] ? __kmalloc+0xfc/0x108 [<c045212d>] ? autoremove_wake_function+0x0/0x2f [<f8067a4f>] i915_gem_execbuffer2+0x9c/0xe2 [i915] [<f7f4aa8c>] drm_ioctl+0x237/0x317 [drm] [<f80679b3>] ? i915_gem_execbuffer2+0x0/0xe2 [i915] [<c04d1976>] ? fsnotify_modify+0x4f/0x5a [<c04dc1c9>] vfs_ioctl+0x27/0x91 [<f7f4a855>] ? drm_ioctl+0x0/0x317 [drm] [<c04dc76a>] do_vfs_ioctl+0x48e/0x4cc [<c040767f>] ? __switch_to+0x125/0x155 [<c0437e3b>] ? finish_task_switch+0x34/0x92 [<c0786093>] ? schedule+0x585/0x5d9 [<c04d2662>] ? vfs_writev+0x36/0x44 [<c04dc7e9>] sys_ioctl+0x41/0x61 [<c040885f>] sysenter_do_call+0x12/0x28 [<c0780000>] ? init_intel+0x140/0x355 I've confirmed that I'm still seeing interrupts from the device: # grep i915 /proc/interrupts && sleep 1 && grep i915 /proc/interrupts 26: 17245844 1 PCI-MSI-edge i915 26: 17245904 1 PCI-MSI-edge i915 and (while hung): # ~/vbltest trying to load module i915...success. starting count: 27352347 freq: 60.08Hz freq: 59.80Hz freq: 59.80Hz freq: 59.80Hz Restarting X shows that the GPU isn't hung, but I get lots of: [176995.150] (WW) intel(0): get vblank counter failed: Invalid argument [176995.150] (WW) intel(0): first get vblank counter failed: Invalid argument [176995.172] (WW) intel(0): get vblank counter failed: Invalid argument [176995.172] (WW) intel(0): first get vblank counter failed: Invalid argument in the X log # ~/modetest -s 12:1920x1200 -v # connector 12 is DVI-D trying to load module i915...success. setting mode 1920x1200 on connector 12, crtc 4 freq: 60.36Hz freq: 59.80Hz freq: 59.80Hz freq: 59.80Hz freq: 59.80Hz freq: 59.80Hz freq: 59.80Hz freq: 59.80Hz freq: 59.80Hz works too - I get rapid flicking between colourful screen and grey screen. If I then try to restart X11 (after running modetest), I get a complete system hang - no response to a PS/2 keyboard, or on the network. I'm going to try leaving the system running modetest instead of X and the GL compositor, to see if that suffers a similar fate.
Just tried running modetest, quitting it, and restarting it - that immediately jams: # ./modetest -s 12:1920x1200 -v trying to load module i915...success. setting mode 1920x1200 on connector 12, crtc 4 select timed out or error (ret 0) select timed out or error (ret 0) and display stuck on the colourful screen.
Does the modetest trace look the same as the earlier X trace? These messages: [176995.150] (WW) intel(0): get vblank counter failed: Invalid argument [176995.150] (WW) intel(0): first get vblank counter failed: Invalid argument look like the kernel is rejecting vblank event requests for some reason. A lack of space (i.e. unconsumed events) should result in an -ENOMEM return though; are there any messages in dmesg indicating why the call failed?
modetest never gives me a helpful trace - it's always stuck in select when it dies. There's no messages in dmesg when things go wrong, to suggest why it's failing. It's just dead. I've also found a way to kill the system (no network, no local console) - run "./modetest -s 12:1920x1200 -v", leave it for a few seconds, and press enter to have it shut down nicely. Run it again, getting output like: # ./modetest -s 12:1920x1200 -v trying to load module i915...success. setting mode 1920x1200 on connector 12, crtc 4 select timed out or error (ret 0) select timed out or error (ret 0) select timed out or error (ret 0) select timed out or error (ret 0) select timed out or error (ret 0) select timed out or error (ret 0) select timed out or error (ret 0) select timed out or error (ret 0) select timed out or error (ret 0) With modetest running in one SSH session, run vbltest in another session. Watch the machine disappear out from under you - even magic SysRq is gone. Some tidbits that might help: * To get identical frequency outputs from vbltest and modetest, I need to run vbltest -s. * If I run vbltest -s instead of vbltest in my "kill the world" setup, it doesn't die. * If I run vbltest -s while the first modetest is running, the second one succeeds. I wonder if we're not requesting the right IRQs...
With the fixes in bug #28788 applied (apply all three patches, reverse the order of finish and prepare as in comment 34 on that bug), I no longer get hangs. The patches are: https://bugs.freedesktop.org/attachment.cgi?id=36463 https://bugs.freedesktop.org/attachment.cgi?id=36464 https://bugs.freedesktop.org/attachment.cgi?id=35551 And then, in https://bugs.freedesktop.org/attachment.cgi?id=36464 change the following hunk: diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2479be0..a846cd8 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -940,22 +940,30 @@ irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS) if (HAS_BSD(dev) && (iir & I915_BSD_USER_INTERRUPT)) DRM_WAKEUP(&dev_priv->bsd_ring.irq_queue); - if (iir & I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT) + if (iir & I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT) { intel_prepare_page_flip(dev, 0); + if (dev_priv->flip_pending_is_done) + intel_finish_page_flip_plane(dev, 0); + } - if (iir & I915_DISPLAY_PLANE_B_FLIP_PENDING_INTERRUPT) + if (iir & I915_DISPLAY_PLANE_B_FLIP_PENDING_INTERRUPT) { + if (dev_priv->flip_pending_is_done) + intel_finish_page_flip_plane(dev, 1); intel_prepare_page_flip(dev, 1); + } if (pipea_stats & vblank_status) { vblank++; drm_handle_vblank(dev, 0); - intel_finish_page_flip(dev, 0); + if (!dev_priv->flip_pending_is_done) + intel_finish_page_flip(dev, 0); } if (pipeb_stats & vblank_status) { vblank++; drm_handle_vblank(dev, 1); - intel_finish_page_flip(dev, 1); + if (!dev_priv->flip_pending_is_done) + intel_finish_page_flip(dev, 1); } if ((pipea_stats & I915_LEGACY_BLC_EVENT_STATUS) || to diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2479be0..a846cd8 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -940,22 +940,30 @@ irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS) if (HAS_BSD(dev) && (iir & I915_BSD_USER_INTERRUPT)) DRM_WAKEUP(&dev_priv->bsd_ring.irq_queue); - if (iir & I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT) + if (iir & I915_DISPLAY_PLANE_A_FLIP_PENDING_INTERRUPT) { intel_prepare_page_flip(dev, 0); + if (dev_priv->flip_pending_is_done) + intel_finish_page_flip_plane(dev, 0); + } - if (iir & I915_DISPLAY_PLANE_B_FLIP_PENDING_INTERRUPT) + if (iir & I915_DISPLAY_PLANE_B_FLIP_PENDING_INTERRUPT) { intel_prepare_page_flip(dev, 1); + if (dev_priv->flip_pending_is_done) + intel_finish_page_flip_plane(dev, 1); + } if (pipea_stats & vblank_status) { vblank++; drm_handle_vblank(dev, 0); - intel_finish_page_flip(dev, 0); + if (!dev_priv->flip_pending_is_done) + intel_finish_page_flip(dev, 0); } if (pipeb_stats & vblank_status) { vblank++; drm_handle_vblank(dev, 1); - intel_finish_page_flip(dev, 1); + if (!dev_priv->flip_pending_is_done) + intel_finish_page_flip(dev, 1); } if ((pipea_stats & I915_LEGACY_BLC_EVENT_STATUS) ||
For reference, the first two patches and the correction to the second patch are included in 2.6.35-rc4 under the following commits: 83f7fd0 drm/i915: don't queue flips during a flip pending event 1afe3e9 drm/i915: gen3 page flipping fixes 70565d0 drm/i915: fix page flip finish vs. prepare on plane B And the final patch is currently committed to drm-intel-next as the following: f602afd drm/i915: Include instdone[1] in hangcheck
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.