Created attachment 37390 [details] i915_error_state I am using the following software: - Fedora kernel-2.6.33.6-147 - Fedora 13 - libdrm-2.4.21-2.fc13 - xorg-x11-drv-intel-2.12 (from rawhide) - xorg-x11-server-Xorg-1.8.2-2.fc13 uname: Linux air 2.6.33.6-147.fc13.x86_64 #1 SMP Thu Jul 8 18:16:22 PDT 2010 x86_64 GNU/Linux lspci: 00:00.0 Host bridge: Intel Corporation Core Processor DRAM Controller (rev 12) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: agpgart-intel 00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 12) (prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, fast devsel, latency 0, IRQ 47 Memory at d0000000 (64-bit, non-prefetchable) [size=4M] Memory at c0000000 (64-bit, prefetchable) [size=256M] I/O ports at 5058 [size=8] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Capabilities: [a4] PCI Advanced Features Kernel driver in use: i915 Kernel modules: i915 00:19.0 Ethernet controller: Intel Corporation 82577LM Gigabit Network Connection (rev 06) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, fast devsel, latency 0, IRQ 48 Memory at d4700000 (32-bit, non-prefetchable) [size=128K] Memory at d472a000 (32-bit, non-prefetchable) [size=4K] I/O ports at 5020 [size=32] Capabilities: [c8] Power Management version 2 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] PCI Advanced Features Kernel driver in use: e1000e Kernel modules: e1000e 00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06) (prog-if 20 [EHCI]) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, medium devsel, latency 0, IRQ 16 Memory at d4729000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci_hcd 00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio (rev 06) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, fast devsel, latency 0, IRQ 50 Memory at d4720000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [130] Root Complex Link Kernel driver in use: HDA Intel Kernel modules: snd-hda-intel 00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 06) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Memory behind bridge: d4600000-d46fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Hewlett-Packard Company Device 7008 Capabilities: [a0] Power Management version 2 Kernel driver in use: pcieport Kernel modules: shpchp 00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 06) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=42, sec-latency=0 I/O behind bridge: 00003000-00004fff Memory behind bridge: d0600000-d45fffff Prefetchable memory behind bridge: 00000000d4900000-00000000d4afffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Hewlett-Packard Company Device 7008 Capabilities: [a0] Power Management version 2 Kernel driver in use: pcieport Kernel modules: shpchp 00:1c.3 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 (rev 06) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=43, subordinate=43, sec-latency=0 Memory behind bridge: d0500000-d05fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Hewlett-Packard Company Device 7008 Capabilities: [a0] Power Management version 2 Kernel driver in use: pcieport Kernel modules: shpchp 00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06) (prog-if 20 [EHCI]) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, medium devsel, latency 0, IRQ 20 Memory at d4728000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci_hcd 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev a6) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=44, subordinate=48, sec-latency=32 I/O behind bridge: 00002000-00002fff Memory behind bridge: d0400000-d04fffff Prefetchable memory behind bridge: 00000000d8000000-00000000dbffffff Capabilities: [50] Subsystem: Hewlett-Packard Company Device 7008 00:1f.0 ISA bridge: Intel Corporation Mobile 5 Series Chipset LPC Interface Controller (rev 06) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=10 <?> Kernel modules: iTCO_wdt 00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 06) (prog-if 01 [AHCI 1.0]) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 46 I/O ports at 5048 [size=8] I/O ports at 5064 [size=4] I/O ports at 5040 [size=8] I/O ports at 5060 [size=4] I/O ports at 5000 [size=32] Memory at d4727000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA v1.0 Capabilities: [b0] PCI Advanced Features Kernel driver in use: ahci 00:1f.6 Signal processing controller: Intel Corporation 5 Series/3400 Series Chipset Thermal Subsystem (rev 06) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, fast devsel, latency 0, IRQ 10 Memory at d4726000 (64-bit, non-prefetchable) [size=4K] Capabilities: [50] Power Management version 3 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- 43:00.0 Network controller: Intel Corporation Centrino Advanced-N 6200 (rev 35) Subsystem: Intel Corporation Centrino Advanced-N 6200 2x2 AGN Flags: bus master, fast devsel, latency 0, IRQ 49 Memory at d0500000 (64-bit, non-prefetchable) [size=8K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-23-14-ff-ff-77-aa-48 Kernel driver in use: iwlagn Kernel modules: iwlagn 44:06.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 06) (prog-if 10 [OHCI]) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, medium devsel, latency 64, IRQ 20 Memory at d0401000 (32-bit, non-prefetchable) [size=2K] Capabilities: [dc] Power Management version 2 Kernel driver in use: firewire_ohci Kernel modules: firewire-ohci 44:06.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 25) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, medium devsel, latency 64, IRQ 22 Memory at d0403000 (32-bit, non-prefetchable) [size=256] Capabilities: [80] Power Management version 2 Kernel driver in use: sdhci-pci Kernel modules: sdhci-pci 44:06.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 14) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, medium devsel, latency 0, IRQ 11 Memory at d0402000 (32-bit, non-prefetchable) [size=256] Capabilities: [80] Power Management version 2 44:06.3 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev bb) Subsystem: Hewlett-Packard Company Device 7008 Flags: bus master, medium devsel, latency 168, IRQ 22 Memory at d0400000 (32-bit, non-prefetchable) [size=4K] Bus: primary=44, secondary=45, subordinate=48, sec-latency=176 Memory window 0: d8000000-dbfff000 (prefetchable) Memory window 1: dc000000-dffff000 I/O window 0: 00002000-000020ff I/O window 1: 00002400-000024ff 16-bit legacy interface ports at 0001 Kernel driver in use: yenta_cardbus Kernel modules: yenta_socket ff:00.0 Host bridge: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers (rev 02) Subsystem: Intel Corporation Device 8086 Flags: bus master, fast devsel, latency 0 ff:00.1 Host bridge: Intel Corporation Core Processor QuickPath Architecture System Address Decoder (rev 02) Subsystem: Intel Corporation Device 8086 Flags: bus master, fast devsel, latency 0 ff:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 02) Subsystem: Intel Corporation Device 8086 Flags: bus master, fast devsel, latency 0 ff:02.1 Host bridge: Intel Corporation Core Processor QPI Physical 0 (rev 02) Subsystem: Intel Corporation Device 8086 Flags: bus master, fast devsel, latency 0 ff:02.2 Host bridge: Intel Corporation Core Processor Reserved (rev 02) Subsystem: Intel Corporation Device 8086 Flags: bus master, fast devsel, latency 0 ff:02.3 Host bridge: Intel Corporation Core Processor Reserved (rev 02) Subsystem: Intel Corporation Device 8086 Flags: bus master, fast devsel, latency 0 Other relevant info: VGA BIOS: https://bugs.freedesktop.org/attachment.cgi?id=37080 The hang happened while rss-glx-skyrocket (a rss-glx screensaver) was running. dmesg: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 953036 at 931817) [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung render error detected, EIR: 0x00000000 [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung render error detected, EIR: 0x00000000 [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung (plenty of these in a loop) I have attached a dump of /sys/kernel/debug/dri/0/i915_error_state. Phil.
WAIT_FOR_EVENT hang. Did you notice anything else happening at the time, like a modeset change, dpms on/off, unplugging a monitor or two? Meh, I need to also include a full register dump in the error state.
(In reply to comment #1) > WAIT_FOR_EVENT hang. Did you notice anything else happening at the time, like a > modeset change, dpms on/off, unplugging a monitor or two? Yes, during the "run", the DPMS Off kicked in. I don't if the hand occurred at the time the DPMS went Off, or when the DPMS when on (or in between for that matter). > Meh, I need to also include a full register dump in the error state. Do you want the output of intel_reg_dumper next time it happens?
Created attachment 37403 [details] [review] trigger scanline wait at pipe off time I wonder if this patch helps? The intent is to trigger any outstanding scanline wait event before shutting off the pipe. When the pipe shuts off, it should end up stopping on the first line of the next frame, so hopefully this register programming is correct.
(In reply to comment #3) > Created an attachment (id=37403) [details] > trigger scanline wait at pipe off time > > I wonder if this patch helps? The intent is to trigger any outstanding > scanline wait event before shutting off the pipe. When the pipe shuts off, it > should end up stopping on the first line of the next frame, so hopefully this > register programming is correct. I'm recompiling a kernel right now with this patch. I will report on its effect later. Anything you'd want if I notice a hang again? Thanks! Phil.
Created attachment 37613 [details] intel_reg_dumper after screen saver triggered GPU hang after running xscreensaver-demo, GPU hangs with, [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 802356 at 799091) and intel_reg_dumper was taken.
(In reply to comment #3) > Created an attachment (id=37403) [details] > trigger scanline wait at pipe off time > > I wonder if this patch helps? The intent is to trigger any outstanding > scanline wait event before shutting off the pipe. When the pipe shuts off, it > should end up stopping on the first line of the next frame, so hopefully this > register programming is correct. no, this patch doesn't help. GPU still hangs on xscreensaver-demo. tested on 2.6.35 kernel, xf86-video-intel-2.12.0, xorg-server-1.8.2
Can you also grab the error state of the hang with the patch applied so we can confirm the bug is identical? It'll be typical if fixing the WAIT_FOR_EVENT hang means we just hit mesa submitting an illegal op...
Created attachment 37632 [details] i915_error_state without i915-clear-scanline-wait.patch error state after the GPU hang
Created attachment 37633 [details] i915_error_state with i915-clear-scanline-wait.patch error state after the GPU hang, with i915-clear-scanline-wait.patch
Created attachment 37634 [details] intel_reg_dumper without the i915-clear-scanline-wait.patch intel_reg_dumper after the GPU hang
Created attachment 37635 [details] intel_reg_dumper with the i915-clear-scanline-wait.patch intel_reg_dumper after the GPU hang, with the i915-clear-scanline-wait.patch
Created attachment 37688 [details] [review] My variant upon Jesse's idea. (Note this will only apply on top of my for-anholt series of pending patches.)
(In reply to comment #12) > Created an attachment (id=37688) [details] > My variant upon Jesse's idea. > > (Note this will only apply on top of my for-anholt series of pending patches.) would you please give me something which applicable to stable kernel 2.6.35? I do not know how to dig up the so-called -anholt series patches.
Created attachment 37717 [details] i915_error_state.txt after chris's patch
Created attachment 37718 [details] intel_reg_dumper.txt after chris's patch
Ok, this looks mighty dubious: 0x0903c15c: 0x79000002: 3DSTATE_DRAWING_RECTANGLE 0x0903c160: 0x00000000: top left: 0,0 0x0903c164: 0x00000000: bottom right: 0,0 0x0903c168: 0x00000000: origin: 0,0 And the hang is indicative that the batchbuffer is itself the cause. This hang is sufficiently different from the original WAIT_FOR_EVENT hang, and the 0x0 surface could be a vital clue to the original bug.
GPU hang happens more often for 2.6.35 kernel. Basically, machine is unusable with 2.6.35.
commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Nov 13 09:49:11 2010 +0000 drm/i915: Retire any pending operations on the old scanout when switching An old and oft reported bug, is that of the GPU hanging on a MI_WAIT_FOR_EVENT following a mode switch. The cause is that the GPU is waiting on a scanline counter on an inactive pipe, and so waits for a very long time until eventually the user reboots his machine. We can prevent this either by moving the WAIT into the kernel and thereby incurring considerable cost on every swapbuffers, or by waiting for the GPU to retire the last batch that accesses the framebuffer before installing a new one. As mode switches are much rarer than swap buffers, this looks like an easy choice. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28964 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29252 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.