Bug 77094

Summary:

PRIME/Hybrid graphics render offload issues with gen7

Product:

DRI

Reporter:

Nathan Schulte <nmschulte>

Component:

DRM/Intel

Assignee:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

medium

CC:

intel-gfx-bugs

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
drm error state dump	none
Xorg log from error state dump	none
kernel log from error state dump	none

Description Nathan Schulte 2014-04-06 00:18:59 UTC

Comment 1 Daniel Vetter 2014-04-06 09:21:05 UTC

Kinda not enough information for an actionable bug report ...

Comment 2 Nathan Schulte 2014-04-07 07:42:20 UTC

Created attachment 97018 [details]
drm error state dump

Comment 3 Nathan Schulte 2014-04-07 07:42:55 UTC

I'm experiencing issues with Offload Sink/Source with an Intel HD 4600 (Core i7 4910MQ) and AMD Radeon HD 8970M.  The machine is a laptop, a Clevo P150SM based machine (Sager NP8265), which has the screen connected via eDP to the integrated Intel chip, and the discrete has no connected heads.  I'm running Debian Sid, and experience the issue with the 13.13 and 13.14-rc7 kernels, with Debian's Xorg, DDX, DRM, and Mesa: --

$ Xorg -version
X.Org X Server 1.15.0.901 (1.15.1 RC 1)
Release Date: 2014-03-21
X Protocol Version 11, Revision 0
Build Operating System: Linux 3.12-1-amd64 x86_64 Debian
Current Operating System: Linux nms-debian 3.14-rc7-amd64 #1 SMP Debian 3.14~rc7-1~exp1 (2014-03-17) x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.14-rc7-amd64 root=UUID=74330f52-bac4-487d-9874-aa51e41d6b34 ro quiet
Build Date: 31 March 2014  10:25:32AM
xorg-server 2:1.15.0.901-1 (http://www.debian.org/support) 
Current version of pixman: 0.32.4
	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.

$ glxinfo | grep 'version\|renderer '
server glx version string: 1.4
client glx version string: 1.4
    GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer, 
GLX version: 1.4
    GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer, 
OpenGL renderer string: Mesa DRI Intel(R) Haswell Mobile 
OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.1.0
OpenGL core profile shading language version string: 3.30
OpenGL version string: 3.0 Mesa 10.1.0
OpenGL shading language version string: 1.30

I've also experienced the issue when building DRM/DRI, Mesa3D (classic i915/i965 and Gallium w/ LLVM for radeonsi; no Gallium3D for Intel, only for Radeon) and the DDXs (xf86-video-intel and xf86-video-ati) from mainline; same issues it seems.

I received an error state dump at one point, which I have attached here.  Quite honestly, I'm not certain if the dump is actually related to this issue, as I've had issues with Xorg loading the system wide DRI regardless of how much I tried to force it otherwise.  (Before giving up, I last simply copied the drivers to both locations, but I was still receiving segfaults from many applications: perhaps there is simply an issue with that particular build of mainline, and it was nothing to do with different driver versions loading).

These are the commands I was running to test: --

$ cat go.sh 
#!/usr/bin/env bash

export LIBGL_DEBUG=verbose

echo "$(date +%H:%M:%S.%N) ---- setting up offload sink" >> /var/log/kern.log
xrandr --setprovideroffloadsink 0x45 0x6f

echo "$(date +%H:%M:%S.%N) ---- executing glxinfo" >> /var/log/kern.log
DRI_PRIME=1 glxinfo > glxinfo.out

After running these commands, X seems to crash, and I'm presented with my greeter/face chooser again.  I seem to be able to repeat this as many times as I like, though I believe I saw X initializing the cards in a different order sometimes when doing so.

As well, I believe output delegation is working fine: when I setup output sink for the discrete card, I was presented with a black screen with "INIT 2.88" (or something similar) printed at the top left as though the screen was configured with a console, but no text was printed after the initial framebuffer initialization or something.  The machine seemed to respond as though nothing bad had happened, and I was able to turn off the machine by typing switching to vtty1 and typing the right things.

Comment 4 Nathan Schulte 2014-04-07 07:46:05 UTC

As well, I apologize for the less-than-half open bug report: I accidentally submitted the form and couldn't find a way to delete it, I was distracted while fleshing it out.

Comment 5 Nathan Schulte 2014-04-07 07:48:08 UTC

Created attachment 97019 [details]
Xorg log from error state dump

Comment 6 Nathan Schulte 2014-04-07 07:48:37 UTC

Created attachment 97020 [details]
kernel log from error state dump

Comment 7 Chris Wilson 2014-04-07 09:19:38 UTC

The X crash indeed looks independent. It is either just a plain bug in the Xserver (try going back to 1.14 or using the 1.15 release, more likely to better using 1.14) or something to do

[   597.626] (WW) RADEON(G0): Direct rendering disabled

as the crash involves DRI.

Comment 8 Chris Wilson 2014-04-07 09:21:45 UTC

The GPU hang looks to be a broken blorp command packet inside a batchbuffer. Either it was truly incomplete, or something overwrote the batch before execution.

Comment 9 Nathan Schulte 2014-04-07 14:47:01 UTC

I will experiment with a couple of different versions of the software to see if I can mitigate the X crash or such, or see if I can get another error state dump.  Assuming those are unfruitful, you're thinking this might be an X bug (or perhaps one of the DDX's particularly)?  I will try removing all of the X clients from the picture as well by using startx or such.

It looks like that particular set of logs/dumps is from 13.14-rc7 with the mainline builds for DRM/Mesa3D.  As well, would someone be able to set me straight on which repository is building which half (or third?) of the DRI code?  That would be much appreciated as I can't seem to remember it straight.

As well, any ideas as to why Xorg would be blatantly ignoring my LIBGL_DRIVERS_PATH (tried set to both /usr/local/x86_64-linux-gnu/lib and /usr/local/x86_64-linux-gnu/lib/dri; Xorg loads "intel" module properly (as I've setup the module path in config), but after that the DRI from /usr/lib/x86_64-linux-gnu/dri is loaded by Xorg, while all other processes seem to get the right one from /usr/local.  The intel module in /usr/local is properly built by the build system with DRI_DRIVER_PATH=/usr/local/x86_64-linux-gnu/lib/dri, and I am then at a loss.  This happens even when I run Xorg as my user from a shell.

Comment 10 Chris Wilson 2014-04-07 15:15:47 UTC

(In reply to comment #9)
> I will experiment with a couple of different versions of the software to see
> if I can mitigate the X crash or such, or see if I can get another error
> state dump.  Assuming those are unfruitful, you're thinking this might be an
> X bug (or perhaps one of the DDX's particularly)?  I will try removing all
> of the X clients from the picture as well by using startx or such.

The crash looks like a core Xserver bug, but it could be one of the DDXs feeding in garbage. Double checking with an older Xserver seems reasonable.


> It looks like that particular set of logs/dumps is from 13.14-rc7 with the
> mainline builds for DRM/Mesa3D.  As well, would someone be able to set me
> straight on which repository is building which half (or third?) of the DRI
> code?  That would be much appreciated as I can't seem to remember it
> straight.

http://cgit.freedesktop.org/mesa/mesa/

> As well, any ideas as to why Xorg would be blatantly ignoring my
> LIBGL_DRIVERS_PATH (tried set to both /usr/local/x86_64-linux-gnu/lib and
> /usr/local/x86_64-linux-gnu/lib/dri; Xorg loads "intel" module properly (as
> I've setup the module path in config), but after that the DRI from
> /usr/lib/x86_64-linux-gnu/dri is loaded by Xorg, while all other processes
> seem to get the right one from /usr/local.  The intel module in /usr/local
> is properly built by the build system with
> DRI_DRIVER_PATH=/usr/local/x86_64-linux-gnu/lib/dri, and I am then at a
> loss.  This happens even when I run Xorg as my user from a shell.

X is a setuid root binary - it has to be very careful what environment variables it uses (basically it can't trust any of them). The path X uses to load its DRI interface is determined at compile time by `$PKG_CONFIG --variable=dridriverdir dri`

Comment 11 Nathan Schulte 2014-04-07 16:39:57 UTC

Ah, I see.  That means I ought to be building X when building DRI; correct?  And I'm right to believe that X loading one version and everything else another is prone to error?

I meant to ask for clarification: you're suggesting I run 1.14 rather than 1.15?  "More likely to better 1.14" means what exactly?  I'll give this a shot once I get some free time.

Comment 12 Nathan Schulte 2014-07-21 18:30:48 UTC

I tried this again today with Debian's packaged the 3.16-rc5 kernel, and things look much better!  This also seems to be working just as well with the packaged 3.15-trunk (3.15.5) kernel.

All of the provider IDs have been bumped by 0x10, and I receive a seemingly harmless symbol issue with LIBGL_DEBUG=verbose:

libGL: driver does not expose __driDriverGetExtensions_radeonsi(): /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so: undefined symbol: __driDriverGetExtensions_radeonsi

I think this means we can close the bug report, though for closure I'm curious what the fix was.  I'm willing to bisect the fix if it may be of use.

Comment 13 Nathan Schulte 2014-07-21 18:48:15 UTC

This is also now working in the Debian packaged 3.14.12 kernel (built 11 July, 2014)!

The window decorations for the DRI_PRIME backed glxgears window suffers from corruption/artifacting, but I see very similar (probably identical) corruption at other times in GNOME (when locking/waking, and in display configuration previews mainly) even before PRIME/hybrid graphics was working.  Should I make another report for this, or is it likely that this is an issue with GNOME, and not something this low in the stack?

Comment 14 Chris Wilson 2014-07-21 18:56:06 UTC

If you get corruption not using PRIME, that is a different issue with the hw/drivers that needs to be analyzed.

Comment 15 Nathan Schulte 2014-07-21 20:39:46 UTC

First, I believe this bug report is actually the same problem as this one:

https://bugs.freedesktop.org/show_bug.cgi?id=80001 

The window decorations shouldn't be rendered via PRIME either (even if the window buffer/surface _is_ rendered by PRIME), correct?  I'll file another bug report for this issue.

One more question: what do I need to do in order to "turn off" the Intel card, and have all "work" done on the discrete (AMD) card?  I believe I was referring to this as "output delegation" in my original post; the other kind of PRIME/hybrid graphics.  It should also be possible (perhaps only useful in theory) to do "render offload" in the same way, but with _everything_ being offloaded to the discrete card (e.g. DRI_PRIME=1 from boot or such, so the Intel card is), correct?

Comment 16 Jesse Barnes 2014-12-04 21:32:42 UTC

Ok so it looks like the actual issue described in this bug is fixed anyway.  For questions about prime, dri-devel@lists.freedesktop.org is probably the best place.

Comment 17 Nathan Schulte 2014-12-04 22:19:06 UTC

Indeed, the bug is fixed.  Thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.