Bug 99769

Summary: RADEON(0): flip queue failed in radeon_scanout_flip: Cannot allocate memory
Product: xorg Reporter: jorge_monteagudo
Component: Driver/RadeonAssignee: xf86-video-ati maintainers <xorg-driver-ati>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: julien.isorce
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.0.log
none
dmesg
none
Xorg.0.log from xserver-xorg-video-radeon 1:7.8.99+git1702081933.1351e4~gd~x
none
If a TearFree flip fails, fall back to non-TearFree operation
none
If a TearFree flip fails, fall back to non-TearFree operation none

Description jorge_monteagudo 2017-02-11 09:43:37 UTC
Created attachment 129501 [details]
Xorg.0.log

Hi all!

I've installed the next packages from the Oibaf PPA in a workstation with Ubuntu 16.04, kernel 4.4.0-62-generic and BONAIRE amd card.

* libdrm 2.4.75+git1702030630.fe7cb3~gd~x_amd64
* libva 1.7.1-2~gd~x_amd64
* llvm 4.0_4.0~+rc1-1~gd~x_amd64
* mesa 17.1~git1702090730.f3d911~gd~x_amd64
* wayland 1.11.0-2~gd~x_amd64
* xserver-xorg-video-amdgpu 1.2.99+git1701262000.49b092~gd~x
* xserver-xorg-video-ati 7.8.99+git1702081933.1351e4~gd~x

I'm using the radeon driver with the next config file:

20-device.conf

Section "Device"
    Identifier "Device0"
    Driver "radeon"
    Option "AccelMethod" "glamor"
    Option "DRI" "3"
    Option "TearFree" "on"
    Option "ColorTiling" "on"
    Option "ColorTiling2D" "on"
EndSection

The Xorg runs ok, the glxgears works ok but when I run my OpenGL app I get a black screen with some artifacts and the Xorg.0.log begins to show the next trace:

[   306.739] (WW) RADEON(0): flip queue failed in radeon_scanout_flip: Cannot allocate memory
[   306.756] (WW) RADEON(0): flip queue failed in radeon_scanout_flip: Cannot allocate memory
[   306.772] (WW) RADEON(0): flip queue failed in radeon_scanout_flip: Cannot allocate memory
[   306.789] (WW) RADEON(0): flip queue failed in radeon_scanout_flip: Cannot allocate memory
[   306.806] (WW) RADEON(0): flip queue failed in radeon_scanout_flip: Cannot allocate memory

I've find the message in the 'src/drmmode_display.c' from 'xserver-xorg-video-ati-1:7.8.99' but I don't know whats the origin of the problem... What can I do?

Thanks!
Jorge
Comment 1 jorge_monteagudo 2017-02-11 09:44:02 UTC
Created attachment 129502 [details]
dmesg
Comment 2 Michel Dänzer 2017-02-13 03:41:01 UTC
Does this still happen with xf86-video-ati 7.7.1 or newer?
Comment 3 jorge_monteagudo 2017-02-13 15:17:23 UTC
I've track down the problem to some code handling OpenGL contexts in my application.

In catalyst and nvidia the code works OK but with the radeon/amdgpu driver makes the screen paint with artifacts and the 'Cannot allocate memory' trace in Xorg log.

I'm using Qt5 and SDL libraries. The problematic code is:

static GLXContext defaultContext = NULL;
static GLXContext qtContext      = NULL;
static QGLPixelBuffer* pixelBuffer = NULL;
...
pixelBuffer = new QGLPixelBuffer (QSize (1,1), QGLFormat::defaultFormat ());
pixelBuffer->makeCurrent();

qtContext = glXGetCurrentContext();

// create default context
defaultContext = CreateOGLContext (qtContext);

// activate it, hiding context created by SDL
ActivateOGLContext (defaultContext);


Then all the OpenGL objects are created in this 'defaultContext' and something makes not happy the OpenGL stack. Removing the QGLPixelBuffer and using the default OpenGL context created by SDL makes the trick but I can't use the Qt5 with his own context.

Well, I'll try to isolate a minimum example to replicate this problem, but for now I can run my app.

Thanks!




The rest of code:

GLXContext CreateOGLContext(GLXContext share)
{
	GLXContext ctx = NULL;

	// Set error handler
	int32_t (*oldHandler)(Display*, XErrorEvent*) = XSetErrorHandler( &XErrorHandler );

	// Flush errors
	XSync( m_sdl_info.info.x11.display, False );

	// Get window attributes
	XWindowAttributes xWindowAttribs;
	XGetWindowAttributes( m_sdl_info.info.x11.display, m_sdl_info.info.x11.window, &xWindowAttribs );

	// Get visual id from current display
	XVisualInfo visualInfo;
	int32_t 	numItems;

	visualInfo.visualid = XVisualIDFromVisual( xWindowAttribs.visual );

	// Get visual info matching the desired visual id
	XVisualInfo* pVisualInfo = XGetVisualInfo( m_sdl_info.info.x11.display,
											   VisualIDMask,
											   &visualInfo,
											   &numItems );
	if (!pVisualInfo)
	{
		fprintf( stderr, "[CreateOGLContext] Error obtaining visual info\n" );
	}
	else
	{
		// Print selected visual
		// fprintf( stderr, "[CreateOGLContext] Visual 0x%x has been selected\n", (uint32_t)pVisualInfo->visualid );

		// Create new context sharing display lists with current one
		Bool bDirect = share? glXIsDirect( m_sdl_info.info.x11.display, share ) : True;

		ctx = glXCreateContext( m_sdl_info.info.x11.display, pVisualInfo, share, bDirect );

		// Flush errors
		XSync( m_sdl_info.info.x11.display, False );

		// Free visual info
		XFree( pVisualInfo );
	}

	XSetErrorHandler( oldHandler );

	return ctx;
}


int ActivateOGLContext( GLXContext ctx )
{
    // Default returns TRUE! This avoid to do checks when no contexts are enabled.
	int ret = 1;

	// Check OpenGL contexts allowed or if we're activating current context
	if( ctx == glXGetCurrentContext () )
		return ret;

	XLockDisplay( m_sdl_info.info.x11.display );

	ret = glXMakeCurrent( m_sdl_info.info.x11.display, (ctx == NULL)? None : m_sdl_info.info.x11.window, ctx );

	XSync( m_sdl_info.info.x11.display, False );

	XUnlockDisplay( m_sdl_info.info.x11.display );
    return ret;
}
Comment 4 Michel Dänzer 2017-02-14 02:21:27 UTC
There's no direct interaction between application OpenGL stuff and the failing code in the Xorg driver. The only thing I can imagine is that QGLPixelBuffer might leak OpenGL resources, which might ultimately prevent the dedicated scanout buffers used for TearFree from fitting into VRAM. It seems rather unlikely that this could happen with current xf86-video-ati though.

BTW, the original bug description mentions xserver-xorg-video-ati 7.8.99+git1702081933.1351e4~gd~x, but the attached Xorg log file shows version 7.7.0 being used. Did you actually test xf86-video-ati newer than 7.7.0?
Comment 5 jorge_monteagudo 2017-02-14 18:33:41 UTC
Thanks for the explanation!

You're right! The log trace is from the /usr/lib/xorg/modules/drivers/radeon_drv.so from the 'xserver-xorg-video-radeon 7.7.0-1' package. I've updated to 'xserver-xorg-video-radeon 7.8.99+git1702081933.1351e4~gd~x' with the same bad behavior.

Is there any environment var that I could set to dump verbose info in order to isolate the problem? I would like to have the same application code for all the supported hardware in our systems.
Comment 6 Michel Dänzer 2017-02-16 06:45:40 UTC
(In reply to jorge_monteagudo from comment #5)
> I've updated to 'xserver-xorg-video-radeon 7.8.99+git1702081933.1351e4~gd~x'
> with the same bad behavior.

Can you attach a log file from that corresponding to a failure?


> Is there any environment var that I could set to dump verbose info in order
> to isolate the problem?

GALLIUM_HUD=requested-VRAM+VRAM-usage

will draw a HUD overaly graph showing how much VRAM the OpenGL application / driver want to use, and how much is being used in total by the system.
Comment 7 jorge_monteagudo 2017-02-17 08:24:48 UTC
Created attachment 129687 [details]
Xorg.0.log from xserver-xorg-video-radeon 1:7.8.99+git1702081933.1351e4~gd~x

Sure! Attached you'll find it!
Comment 8 Michel Dänzer 2017-03-14 08:00:47 UTC
Created attachment 130209 [details] [review]
If a TearFree flip fails, fall back to non-TearFree operation

This patch should make things chug along after this happens.

Did you get a chance to play with GALLIUM_HUD? Would be interesting to see how much of your VRAM is used when this happens.
Comment 9 Michel Dänzer 2017-03-14 08:06:51 UTC
Created attachment 130210 [details] [review]
If a TearFree flip fails, fall back to non-TearFree operation

The previous patch had leftover debugging code to induce this failure, please test this patch instead.
Comment 10 Michel Dänzer 2017-07-14 01:21:53 UTC
Thanks for the report. At least the worst should be fixed in Git master:

commit 94dc2b80f3ef0b2c17c20501d824fb0447d52e7a
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Tue Mar 14 16:57:17 2017 +0900

    If a TearFree flip fails, fall back to non-TearFree operation

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.