Bug 64073

Summary: [ivb] IPEHR:0xffffffff upon context restore
Product: DRI Reporter: Ross Lagerwall <rosslagerwall>
Component: DRM/IntelAssignee: Ben Widawsky <ben>
Status: CLOSED INVALID QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
i915 error state
none
System log
none
Xorg log
none
lspci -nn
none
glxinfo
none
cat /proc/cpuinfo
none
i915 error state from v3.10-rc2
none
System log from v3.10-rc2 none

Description Ross Lagerwall 2013-04-30 07:37:50 UTC
System environment:
-- chipset: I'm not sure
-- system architecture: 64-bit
-- xf86-video-intel: 2.21.6
-- xserver: 1.14.0
-- mesa: 9.1.1
-- libdrm: 2.4.4
-- kernel: 3.9.0-rc8
-- Linux distribution: Fedora 19
-- Machine or mobo model: Intel i3570K with HD4000
-- Display connector: VGA, single display

Reproducing steps:
From within SuperTuxKart, ensure that resolution is set to 1920x1080,
graphics detail level is set to 7 and Vertical Sync is set to ON,
Use Framebuffer Objects is set to ON, and full screen is set to ON.
About 10% of the time, when running the game with these settings, the
GPU hangs and occasionally after that Xorg crashes.
The hang can happen in the menu system or within the actual game.
When the hang occurs, the graphics may update very slowly or periodically
and then eventually it stops updating at all.

Additional info:
I'm not 100% sure, but I think it may have started after setting
Vertical Sync to ON.
After one of the crashes, I managed to capture the system log, the
Xorg log, and the i915_error_state. Unfortunately I was not running
with drm.debug=14.

Package versions:
Up to date Fedora 19 with:
mesa-libGL-9.1.1-1.fc19.x86_64
kernel-3.9.0-0.rc8.git0.2.fc19.x86_64
libdrm-2.4.44-2.fc19.x86_64
xorg-x11-server-Xorg-1.14.0-6.fc19.x86_64
xorg-x11-drv-intel-2.21.6-1.fc19.x86_64
supertuxkart-data-0.7.3-5.fc19.noarch
supertuxkart-0.7.3-5.fc19.x86_64
Comment 1 Ross Lagerwall 2013-04-30 07:38:32 UTC
Created attachment 78629 [details]
i915 error state
Comment 2 Ross Lagerwall 2013-04-30 07:38:53 UTC
Created attachment 78630 [details]
System log
Comment 3 Ross Lagerwall 2013-04-30 07:39:11 UTC
Created attachment 78631 [details]
Xorg log
Comment 4 Ross Lagerwall 2013-04-30 07:39:29 UTC
Created attachment 78632 [details]
lspci -nn
Comment 5 Ross Lagerwall 2013-04-30 07:39:45 UTC
Created attachment 78633 [details]
glxinfo
Comment 6 Ross Lagerwall 2013-04-30 07:40:01 UTC
Created attachment 78634 [details]
cat /proc/cpuinfo
Comment 7 Chris Wilson 2013-05-01 14:54:33 UTC
This will be interesting to see if

commit 4615d4c9e27eda42c3e965f208a4b4065841498c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 8 14:28:40 2013 +0100

    drm/i915: Use MLC (l3$) for context objects

has any impact. Can you please try the current drm-intel-nightly kernel from ppa:mainline?
Comment 8 Ross Lagerwall 2013-05-01 21:06:18 UTC
(In reply to comment #7)
> This will be interesting to see if
> 
> commit 4615d4c9e27eda42c3e965f208a4b4065841498c
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Apr 8 14:28:40 2013 +0100
> 
>     drm/i915: Use MLC (l3$) for context objects
> 
> has any impact. Can you please try the current drm-intel-nightly kernel from
> ppa:mainline?

Yes, it seems to work well with the drm-intel-nightly kernel.
Comment 9 Daniel Vetter 2013-05-01 21:11:42 UTC
Can you please check whether cherry-picking the referenced patch to a stable kernel fixes the issues, too?
Comment 10 Ross Lagerwall 2013-05-04 08:19:45 UTC
(In reply to comment #9)
> Can you please check whether cherry-picking the referenced patch to a stable
> kernel fixes the issues, too?

Yes, applying it on top of the Ubuntu 3.8 kernel worked fine.
Comment 11 Daniel Vetter 2013-05-06 08:34:20 UTC
I've just sent out the stable backport request, so this should get fixed in the next stable kernel releases (or one of the next, around the merge window there's a bit a lag usually due to the high patch load).

Thanks for reporting this issue and please reopen if it breaks again.
Comment 12 Ross Lagerwall 2013-05-21 19:46:06 UTC
On the same machine, I tried running SuperTuxKart on Arch Linux with Linux kernel 3.10-rc2, mesa 9.1.2 and Intel drivers 2.21.6.

I seemed to get the same hang, even though 3.10-rc2 contains the above-mentioned commit. I will attach the error state and relevant dmesg log.
Comment 13 Ross Lagerwall 2013-05-21 19:46:59 UTC
Created attachment 79626 [details]
i915 error state from v3.10-rc2
Comment 14 Ross Lagerwall 2013-05-21 19:47:27 UTC
Created attachment 79627 [details]
System log from v3.10-rc2
Comment 15 Chris Wilson 2013-05-21 21:28:15 UTC
Aye, that appears to be same hang.
Comment 16 Chris Wilson 2013-06-24 21:33:11 UTC
Note that with IVB and MSAA I see lots of corruption with large swaths of memory being overwritten with pixel values (lots of 0xffffffff especially). That would include the possibility of overwritting context memory. Isolating MSAA in mesa would be tricky... perhaps a hack to disable?
Comment 17 Ross Lagerwall 2013-06-26 07:43:12 UTC
I can confirm that I did see strange white corruption when playing the
game, but I thought it was unrelated or an application error.

Unfortunately, I don't have access to the hardware anymore so I cannot
further test anything.  However, given that the hangs happened on two
different OSes, with the latest kernel versions, it should be easy
enough to reproduce.
Comment 18 Ben Widawsky 2013-06-26 23:04:43 UTC
(In reply to comment #17)
> I can confirm that I did see strange white corruption when playing the
> game, but I thought it was unrelated or an application error.
> 
> Unfortunately, I don't have access to the hardware anymore so I cannot
> further test anything.  However, given that the hangs happened on two
> different OSes, with the latest kernel versions, it should be easy
> enough to reproduce.

If someone can reproduce this, can they read back register 0x20f4?
Comment 19 Ross Lagerwall 2013-06-30 07:53:25 UTC
On Wed, Jun 26, 2013 at 11:04:43PM +0000, bugzilla-daemon@freedesktop.org wrote:
> https://bugs.freedesktop.org/show_bug.cgi?id=64073
> 
> --- Comment #18 from Ben Widawsky <ben@bwidawsk.net> ---
> If someone can reproduce this, can they read back register 0x20f4?
> 

Would that not be in the error state dump I attached?
Comment 20 Chris Wilson 2013-08-11 12:01:58 UTC
Hopefully https://patchwork.kernel.org/patch/2841344/ is the right fix.
Comment 21 Ben Widawsky 2013-08-13 04:19:57 UTC
(In reply to comment #19)
> On Wed, Jun 26, 2013 at 11:04:43PM +0000, bugzilla-daemon@freedesktop.org
> wrote:
> > https://bugs.freedesktop.org/show_bug.cgi?id=64073
> > 
> > --- Comment #18 from Ben Widawsky <ben@bwidawsk.net> ---
> > If someone can reproduce this, can they read back register 0x20f4?
> > 
> 
> Would that not be in the error state dump I attached?

No. But I can no longer remember what I wanted anyway.
Comment 22 Daniel Vetter 2013-08-13 06:55:18 UTC
(In reply to comment #20)
> Hopefully https://patchwork.kernel.org/patch/2841344/ is the right fix.

Can you please test the above patch?
Comment 23 Ross Lagerwall 2013-08-13 11:40:10 UTC
(In reply to comment #22)
> (In reply to comment #20)
> > Hopefully https://patchwork.kernel.org/patch/2841344/ is the right fix.
> 
> Can you please test the above patch?

Unfortunately, as I said in comment #17, I don't have access to the IVB hardware anymore so I can't test the patch.
Comment 24 Daniel Vetter 2013-08-13 15:08:30 UTC
Hw no longer available for testing, so closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.