Bug 94249

Summary:

libdrm_amdgpu: Crash when built with clang

Product:

DRI

Reporter:

Armin K <krejzi>

Component:

libdrm

Assignee:

Default DRI bug account <dri-devel>

Status:

RESOLVED FIXED

QA Contact:

Severity:

normal

Priority:

medium

CC:

krejzi

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
Xorg log	none
gdb backtrace Xorg	none
gdb backtrace glxinfo	none
dmesg output from the current system	none
dmesg output from the current system	none
Make sure the second argument to container_of() is initialized	none

Description Armin K 2016-02-22 15:58:25 UTC

I've recently got a HP Probook 470 G3 laptop, which has an onboard Intel Skylake Graphics and an AMD Radeon R7 M340 GPU.

However, the Linux seems to detect the GPU as following:

01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265] [1002:6900] (rev 83)                                                  

Which makes it a GCN 1.2 GPU which tries to use AMDGPU. That of course doesn't work. The driver loads, but as soon as it does, it corrupts all the available data from lspci -v output. I also can't use it with DRI_PRIME, since X server crashes while loading amdgpu (probably due to it not being an amdgpu supported GPU).

Wikipedia identifies the mentioned card as GCN 1.0 OLAND GPU. Windows catalyst also claims it's R7 M340 and not R7 M360.

It's probably a BIOS issue, but can it be worked around somehow?

Comment 1 Alex Deucher 2016-02-22 16:15:23 UTC

The driver is correct.  It is a topaz GPU.  Try disabling runpm (append amdgpu.runpm=0 on the kernel command line in grub).

Comment 2 Armin K 2016-03-01 15:31:40 UTC

I'm sorry I didn't reply earlier, I didn't get the mail that there was a reply.

Using runpm=0 didn't change a thing.

I forgot to post the corrupted lspci data as said in the original post:

01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265] [1002:6900] (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

Without the amdgpu module loaded, it looks as down below (with other options included with -v).

The X server still crashes when AMDGPU DDX is present. Here's the relevant part of the Xorg.log:

Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE)
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) Backtrace:
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 0: /usr/libexec/Xorg (OsInit+0x35a) [0x5ca1aa]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 1: /lib/libc.so.6 (killpg+0x40) [0x7f5cfaf453df]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 2: /usr/lib/libdrm_amdgpu.so.1 (amdgpu_cs_submit+0x333) [0x7f5cf56ff383]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 3: /usr/lib/libdrm_amdgpu.so.1 (amdgpu_cs_submit+0x3b) [0x7f5cf56ff02b]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 4: /usr/lib/dri/radeonsi_dri.so (radeon_drm_winsys_create+0x14fad) [0x7f5cf451e3ed]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 5: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x45afe1) [0x7f5cf48ecce1]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 6: /usr/lib/dri/radeonsi_dri.so (amdgpu_winsys_create+0x18f9) [0x7f5cf450dc59]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 7: /usr/lib/dri/radeonsi_dri.so (amdgpu_winsys_create+0x5a60) [0x7f5cf4515d80]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 8: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x45b6ce) [0x7f5cf48eda8e]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 9: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x45b448) [0x7f5cf48ed6d8]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 10: /usr/lib/dri/radeonsi_dri.so (amdgpu_winsys_create+0xba8) [0x7f5cf450b688]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) unw_get_proc_name failed: no unwind info found [-10]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 11: /usr/lib/dri/radeonsi_dri.so (?+0xba8) [0x7f5cf4037938]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 12: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x2f81cf) [0x7f5cf462722f]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 13: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x2f5269) [0x7f5cf4621269]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 14: /usr/lib/libgbm.so.1 (gbm_surface_has_free_buffers+0xcfb) [0x7f5cf54f38cb]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 15: /usr/lib/libgbm.so.1 (gbm_surface_has_free_buffers+0xe8) [0x7f5cf54f2108]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 16: /usr/lib/libgbm.so.1 (gbm_create_device+0xa4) [0x7f5cf54f1dc4]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 17: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (_init+0x828b) [0x7f5cf5919d6b]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 18: /usr/libexec/Xorg (InitOutput+0xb52) [0x480732]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 19: /usr/libexec/Xorg (remove_fs_handlers+0x498) [0x4396d8]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 20: /lib/libc.so.6 (__libc_start_main+0xf0) [0x7f5cfaf307a0]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 21: /usr/libexec/Xorg (_start+0x29) [0x421529]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 22: ? (?+0x29) [0x29]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE)
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) Segmentation fault at address 0x10
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE)
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: Fatal server error:
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) Caught signal 11 (Segmentation fault). Server aborting

Ignore the /usr/libexec/gdm-x-session, as GNOME's gdm reditects Xorg.log to systemd journal instead.

Comment 3 Armin K 2016-04-14 16:31:33 UTC

Ping? This is still problematic with Linux 4.5.1, xf86-video-amdgpu-1.1.0, xorg-server-1.18.3 and mesa-11.2.0.

Comment 4 Michel Dänzer 2016-07-29 06:35:24 UTC

First of all, for Topaz / Iceland I think it's best to use current Git snapshots of Mesa and libdrm_amdgpu.

Please attach the full Xorg log corresponding to the failure.

A gdb backtrace would also be helpful, see https://www.x.org/wiki/Development/Documentation/ServerDebugging/ .

Comment 5 Armin K 2016-07-29 14:28:34 UTC

I'm using Mesa 12.1.0-devel (git-c98c732) and libdrm-2.4.70. I think that's quite recent. I'm also using linux-4.7.0 atm.

You can find the requested information in the attached files.

Comment 6 Armin K 2016-07-29 14:30:35 UTC

Created attachment 125417 [details]
Xorg log

Here's the Xorg.log extracted from journal.

Comment 7 Armin K 2016-07-29 14:31:40 UTC

Created attachment 125418 [details]
gdb backtrace Xorg

Here's the best backtrace I could obtain from the Xorg server. I used a core file stored in journal. I'm not sure where those missing symbols are from.

Comment 8 Armin K 2016-07-29 14:32:47 UTC

Created attachment 125419 [details]
gdb backtrace glxinfo

Here's the backtrace of glxinfo when trying to use DRI3 PRIME (not loading amdgpu, but running DRI_PRIME=1 glxinfo with xf86-video-intel using DRI3).

Comment 9 Armin K 2016-07-29 14:34:07 UTC

Created attachment 125420 [details]
dmesg output from the current system

Comment 10 Armin K 2016-07-30 15:04:05 UTC

Created attachment 125437 [details]
dmesg output from the current system

Trying airied's drm-next tree from today. The dmesg output is a bit different but the segfaults are still the same.

Comment 11 Michel Dänzer 2016-08-02 09:33:09 UTC

Weird, it seems to crash on this line of libdrm_amdgpu's amdgpu_vamgr_find_va:

	LIST_FOR_EACH_ENTRY_SAFE(hole, n, &mgr->va_holes, list) {

Are you using any unusual compiler (options)?

Comment 12 Armin K 2016-08-02 09:37:26 UTC

I use clang 3.9 svn (same crash with clang 3.8 by the way) with -march=skylake -g -O2 -pipe

Comment 13 Armin K 2016-08-02 09:44:54 UTC

Rebuilding libdrm with gcc instead of clang using the same flags fixes the problem.

The issues is indeed in libdrm_amdgpu. Now I get this with glxinfo:

OpenGL renderer string: Gallium 0.4 on AMD ICELAND (DRM 3.2.0 / 4.7.0-krejzi, LLVM 3.9.0)
OpenGL core profile version string: 4.3 (Core Profile) Mesa 12.1.0-devel (git-c98c732)
OpenGL core profile shading language version string: 4.30

I'd be grateful if the issue was fixed, I'm trying to get rid of gcc.

Comment 14 Michel Dänzer 2016-08-02 09:49:29 UTC

Created attachment 125478 [details] [review]
Make sure the second argument to container_of() is  initialized

Does this patch fix the problem with clang?

Comment 15 Armin K 2016-08-02 10:04:35 UTC

(In reply to Michel Dänzer from comment #14)
> Created attachment 125478 [details] [review] [review]
> Make sure the second argument to container_of() is  initialized
> 
> Does this patch fix the problem with clang?

It appears like its working! Thanks.

Comment 16 Michel Dänzer 2016-08-03 09:38:04 UTC

Should be fixed with

commit b214b05ccd433c484a6a65e491a1a51b19e4811d
Author: Rob Clark <robclark@freedesktop.org>
Date:   Tue Aug 2 16:16:02 2016 -0400

    list: fix an issue with android build using clang

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.