Bug 90091

Summary: [BDW] GPU HANG: ecode 8:0:0x85dffffb, in Xorg [937] (IOMMU)
Product: DRI Reporter: Yves-Alexis <corsac>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: dwmw2, intel-gfx-bugs, post
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
/sys/class/drm/card0/error (xzipped) none

Description Yves-Alexis 2015-04-18 21:30:44 UTC
Hi,

I just experienced a GPU hang on a broadwell config (ThinkPad X250).

The dmesg looks like https://paste.debian.net/167542/ and the error file is attached.

I'm running Debian sid, with (custom) kernel 4.0, DDX 2.99.917, libdrm 2.4.58 and mesa 10.4.2. As the log indicates, the I/OMMU is enabled (with intel_iommu=on). I also have as command line arguments which might be relevant: i915.enable_ips=0 and intremap_no_x2apic_optout.

If you need anything more, please ask.
Comment 1 Yves-Alexis 2015-04-18 21:31:57 UTC
Created attachment 115183 [details]
/sys/class/drm/card0/error (xzipped)

Ok so actually the attach failed (size is 3M, more than 3000k I guess)
Comment 2 Mika Kuoppala 2015-04-20 15:22:42 UTC
ERROR: 0x00000008
    Invalid physical address in ROSTRM interface (PAVP)

0x1e0a18b0:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush
0x1e0a18b4:      0x00101002:    destination address
0x1e0a18b8:      0x00000000:    immediate dword low
0x1e0a18bc:      0x00000000:    immediate dword high
0x1e0a18c8:      0x784d0000: 3D UNKNOWN: 3d_965 opcode = 0x784d
0x1e0a18cc:      0x71946400: 3D UNKNOWN: 3d_965 opcode = 0x7194
0x1e0a18d0:      0x78240000: 3D UNKNOWN: 3d_965 opcode = 0x7824
0x1e0a18d4:      0x00007ec1: MI_NOOP

Head pointing to 0x1e0a18c8.

Looks like PIPE_CONTROL is structured for gen < 8. That would mean DDX is not recent enough.
Comment 3 Yves-Alexis 2015-04-20 15:26:45 UTC
(In reply to Mika Kuoppala from comment #2)
> ERROR: 0x00000008
>     Invalid physical address in ROSTRM interface (PAVP)
> 
> 0x1e0a18b0:      0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC
> write flush, no inst flush
> 0x1e0a18b4:      0x00101002:    destination address
> 0x1e0a18b8:      0x00000000:    immediate dword low
> 0x1e0a18bc:      0x00000000:    immediate dword high
> 0x1e0a18c8:      0x784d0000: 3D UNKNOWN: 3d_965 opcode = 0x784d
> 0x1e0a18cc:      0x71946400: 3D UNKNOWN: 3d_965 opcode = 0x7194
> 0x1e0a18d0:      0x78240000: 3D UNKNOWN: 3d_965 opcode = 0x7824
> 0x1e0a18d4:      0x00007ec1: MI_NOOP
> 
> Head pointing to 0x1e0a18c8.
> 
> Looks like PIPE_CONTROL is structured for gen < 8. That would mean DDX is
> not recent enough.

DDX is 2.99.917. I can try git if needed, but I'd assume 2.99.917 already knows about BDW? Would Xorg.0.log be helpful here?
Comment 4 Chris Wilson 2015-04-20 19:59:18 UTC
Looks like intel_error_decode is confused (and not reporting the basics properly). It is just stopping on the pipe-control barrier, with the usual suspect being dmar.
Comment 5 Yves-Alexis 2015-04-21 08:53:41 UTC
I've added igfx_off to the intel_iommu parameter. I had the feeling that it was not necessary on recent Intel {C,G}PUs, but I'll report back if it improves the situation.
Comment 6 Yves-Alexis 2015-05-03 18:55:08 UTC
(In reply to Yves-Alexis from comment #5)
> I've added igfx_off to the intel_iommu parameter. I had the feeling that it
> was not necessary on recent Intel {C,G}PUs, but I'll report back if it
> improves the situation.

So it seems that igfx_off is indeed needed even on bdw: the system seems stable since then. I'll retry the same setup without igfx_off again just to confirm.
Comment 7 Jani Nikula 2015-05-04 09:11:26 UTC
(In reply to Yves-Alexis from comment #6)
> (In reply to Yves-Alexis from comment #5)
> > I've added igfx_off to the intel_iommu parameter. I had the feeling that it
> > was not necessary on recent Intel {C,G}PUs, but I'll report back if it
> > improves the situation.
> 
> So it seems that igfx_off is indeed needed even on bdw: the system seems
> stable since then. I'll retry the same setup without igfx_off again just to
> confirm.

CC David
Comment 8 Chris Wilson 2015-06-11 09:51:18 UTC

*** This bug has been marked as a duplicate of bug 89360 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.