Hi, I just experienced a GPU hang on a broadwell config (ThinkPad X250). The dmesg looks like https://paste.debian.net/167542/ and the error file is attached. I'm running Debian sid, with (custom) kernel 4.0, DDX 2.99.917, libdrm 2.4.58 and mesa 10.4.2. As the log indicates, the I/OMMU is enabled (with intel_iommu=on). I also have as command line arguments which might be relevant: i915.enable_ips=0 and intremap_no_x2apic_optout. If you need anything more, please ask.
Created attachment 115183 [details] /sys/class/drm/card0/error (xzipped) Ok so actually the attach failed (size is 3M, more than 3000k I guess)
ERROR: 0x00000008 Invalid physical address in ROSTRM interface (PAVP) 0x1e0a18b0: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC write flush, no inst flush 0x1e0a18b4: 0x00101002: destination address 0x1e0a18b8: 0x00000000: immediate dword low 0x1e0a18bc: 0x00000000: immediate dword high 0x1e0a18c8: 0x784d0000: 3D UNKNOWN: 3d_965 opcode = 0x784d 0x1e0a18cc: 0x71946400: 3D UNKNOWN: 3d_965 opcode = 0x7194 0x1e0a18d0: 0x78240000: 3D UNKNOWN: 3d_965 opcode = 0x7824 0x1e0a18d4: 0x00007ec1: MI_NOOP Head pointing to 0x1e0a18c8. Looks like PIPE_CONTROL is structured for gen < 8. That would mean DDX is not recent enough.
(In reply to Mika Kuoppala from comment #2) > ERROR: 0x00000008 > Invalid physical address in ROSTRM interface (PAVP) > > 0x1e0a18b0: 0x7a000004: PIPE_CONTROL: no write, no depth stall, no RC > write flush, no inst flush > 0x1e0a18b4: 0x00101002: destination address > 0x1e0a18b8: 0x00000000: immediate dword low > 0x1e0a18bc: 0x00000000: immediate dword high > 0x1e0a18c8: 0x784d0000: 3D UNKNOWN: 3d_965 opcode = 0x784d > 0x1e0a18cc: 0x71946400: 3D UNKNOWN: 3d_965 opcode = 0x7194 > 0x1e0a18d0: 0x78240000: 3D UNKNOWN: 3d_965 opcode = 0x7824 > 0x1e0a18d4: 0x00007ec1: MI_NOOP > > Head pointing to 0x1e0a18c8. > > Looks like PIPE_CONTROL is structured for gen < 8. That would mean DDX is > not recent enough. DDX is 2.99.917. I can try git if needed, but I'd assume 2.99.917 already knows about BDW? Would Xorg.0.log be helpful here?
Looks like intel_error_decode is confused (and not reporting the basics properly). It is just stopping on the pipe-control barrier, with the usual suspect being dmar.
I've added igfx_off to the intel_iommu parameter. I had the feeling that it was not necessary on recent Intel {C,G}PUs, but I'll report back if it improves the situation.
(In reply to Yves-Alexis from comment #5) > I've added igfx_off to the intel_iommu parameter. I had the feeling that it > was not necessary on recent Intel {C,G}PUs, but I'll report back if it > improves the situation. So it seems that igfx_off is indeed needed even on bdw: the system seems stable since then. I'll retry the same setup without igfx_off again just to confirm.
(In reply to Yves-Alexis from comment #6) > (In reply to Yves-Alexis from comment #5) > > I've added igfx_off to the intel_iommu parameter. I had the feeling that it > > was not necessary on recent Intel {C,G}PUs, but I'll report back if it > > improves the situation. > > So it seems that igfx_off is indeed needed even on bdw: the system seems > stable since then. I'll retry the same setup without igfx_off again just to > confirm. CC David
*** This bug has been marked as a duplicate of bug 89360 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.