Bug 94249

Summary: libdrm_amdgpu: Crash when built with clang
Product: DRI Reporter: Armin K <krejzi>
Component: libdrmAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: krejzi
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg log
none
gdb backtrace Xorg
none
gdb backtrace glxinfo
none
dmesg output from the current system
none
dmesg output from the current system
none
Make sure the second argument to container_of() is initialized none

Description Armin K 2016-02-22 15:58:25 UTC
I've recently got a HP Probook 470 G3 laptop, which has an onboard Intel Skylake Graphics and an AMD Radeon R7 M340 GPU.

However, the Linux seems to detect the GPU as following:

01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265] [1002:6900] (rev 83)                                                  

Which makes it a GCN 1.2 GPU which tries to use AMDGPU. That of course doesn't work. The driver loads, but as soon as it does, it corrupts all the available data from lspci -v output. I also can't use it with DRI_PRIME, since X server crashes while loading amdgpu (probably due to it not being an amdgpu supported GPU).

Wikipedia identifies the mentioned card as GCN 1.0 OLAND GPU. Windows catalyst also claims it's R7 M340 and not R7 M360.

It's probably a BIOS issue, but can it be worked around somehow?
Comment 1 Alex Deucher 2016-02-22 16:15:23 UTC
The driver is correct.  It is a topaz GPU.  Try disabling runpm (append amdgpu.runpm=0 on the kernel command line in grub).
Comment 2 Armin K 2016-03-01 15:31:40 UTC
I'm sorry I didn't reply earlier, I didn't get the mail that there was a reply.

Using runpm=0 didn't change a thing.

I forgot to post the corrupted lspci data as said in the original post:

01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265] [1002:6900] (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

Without the amdgpu module loaded, it looks as down below (with other options included with -v).

The X server still crashes when AMDGPU DDX is present. Here's the relevant part of the Xorg.log:

Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE)
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) Backtrace:
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 0: /usr/libexec/Xorg (OsInit+0x35a) [0x5ca1aa]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 1: /lib/libc.so.6 (killpg+0x40) [0x7f5cfaf453df]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 2: /usr/lib/libdrm_amdgpu.so.1 (amdgpu_cs_submit+0x333) [0x7f5cf56ff383]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 3: /usr/lib/libdrm_amdgpu.so.1 (amdgpu_cs_submit+0x3b) [0x7f5cf56ff02b]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 4: /usr/lib/dri/radeonsi_dri.so (radeon_drm_winsys_create+0x14fad) [0x7f5cf451e3ed]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 5: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x45afe1) [0x7f5cf48ecce1]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 6: /usr/lib/dri/radeonsi_dri.so (amdgpu_winsys_create+0x18f9) [0x7f5cf450dc59]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 7: /usr/lib/dri/radeonsi_dri.so (amdgpu_winsys_create+0x5a60) [0x7f5cf4515d80]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 8: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x45b6ce) [0x7f5cf48eda8e]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 9: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x45b448) [0x7f5cf48ed6d8]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 10: /usr/lib/dri/radeonsi_dri.so (amdgpu_winsys_create+0xba8) [0x7f5cf450b688]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) unw_get_proc_name failed: no unwind info found [-10]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 11: /usr/lib/dri/radeonsi_dri.so (?+0xba8) [0x7f5cf4037938]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 12: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x2f81cf) [0x7f5cf462722f]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 13: /usr/lib/dri/radeonsi_dri.so (__driDriverGetExtensions_radeonsi+0x2f5269) [0x7f5cf4621269]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 14: /usr/lib/libgbm.so.1 (gbm_surface_has_free_buffers+0xcfb) [0x7f5cf54f38cb]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 15: /usr/lib/libgbm.so.1 (gbm_surface_has_free_buffers+0xe8) [0x7f5cf54f2108]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 16: /usr/lib/libgbm.so.1 (gbm_create_device+0xa4) [0x7f5cf54f1dc4]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 17: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (_init+0x828b) [0x7f5cf5919d6b]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 18: /usr/libexec/Xorg (InitOutput+0xb52) [0x480732]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 19: /usr/libexec/Xorg (remove_fs_handlers+0x498) [0x4396d8]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 20: /lib/libc.so.6 (__libc_start_main+0xf0) [0x7f5cfaf307a0]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 21: /usr/libexec/Xorg (_start+0x29) [0x421529]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) 22: ? (?+0x29) [0x29]
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE)
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) Segmentation fault at address 0x10
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE)
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: Fatal server error:
Mar 01 16:23:34 krejzi /usr/libexec/gdm-x-session[629]: (EE) Caught signal 11 (Segmentation fault). Server aborting

Ignore the /usr/libexec/gdm-x-session, as GNOME's gdm reditects Xorg.log to systemd journal instead.
Comment 3 Armin K 2016-04-14 16:31:33 UTC
Ping? This is still problematic with Linux 4.5.1, xf86-video-amdgpu-1.1.0, xorg-server-1.18.3 and mesa-11.2.0.
Comment 4 Michel Dänzer 2016-07-29 06:35:24 UTC
First of all, for Topaz / Iceland I think it's best to use current Git snapshots of Mesa and libdrm_amdgpu.

Please attach the full Xorg log corresponding to the failure.

A gdb backtrace would also be helpful, see https://www.x.org/wiki/Development/Documentation/ServerDebugging/ .
Comment 5 Armin K 2016-07-29 14:28:34 UTC
I'm using Mesa 12.1.0-devel (git-c98c732) and libdrm-2.4.70. I think that's quite recent. I'm also using linux-4.7.0 atm.

You can find the requested information in the attached files.
Comment 6 Armin K 2016-07-29 14:30:35 UTC
Created attachment 125417 [details]
Xorg log

Here's the Xorg.log extracted from journal.
Comment 7 Armin K 2016-07-29 14:31:40 UTC
Created attachment 125418 [details]
gdb backtrace Xorg

Here's the best backtrace I could obtain from the Xorg server. I used a core file stored in journal. I'm not sure where those missing symbols are from.
Comment 8 Armin K 2016-07-29 14:32:47 UTC
Created attachment 125419 [details]
gdb backtrace glxinfo

Here's the backtrace of glxinfo when trying to use DRI3 PRIME (not loading amdgpu, but running DRI_PRIME=1 glxinfo with xf86-video-intel using DRI3).
Comment 9 Armin K 2016-07-29 14:34:07 UTC
Created attachment 125420 [details]
dmesg output from the current system
Comment 10 Armin K 2016-07-30 15:04:05 UTC
Created attachment 125437 [details]
dmesg output from the current system

Trying airied's drm-next tree from today. The dmesg output is a bit different but the segfaults are still the same.
Comment 11 Michel Dänzer 2016-08-02 09:33:09 UTC
Weird, it seems to crash on this line of libdrm_amdgpu's amdgpu_vamgr_find_va:

	LIST_FOR_EACH_ENTRY_SAFE(hole, n, &mgr->va_holes, list) {

Are you using any unusual compiler (options)?
Comment 12 Armin K 2016-08-02 09:37:26 UTC
I use clang 3.9 svn (same crash with clang 3.8 by the way) with -march=skylake -g -O2 -pipe
Comment 13 Armin K 2016-08-02 09:44:54 UTC
Rebuilding libdrm with gcc instead of clang using the same flags fixes the problem.

The issues is indeed in libdrm_amdgpu. Now I get this with glxinfo:

OpenGL renderer string: Gallium 0.4 on AMD ICELAND (DRM 3.2.0 / 4.7.0-krejzi, LLVM 3.9.0)
OpenGL core profile version string: 4.3 (Core Profile) Mesa 12.1.0-devel (git-c98c732)
OpenGL core profile shading language version string: 4.30

I'd be grateful if the issue was fixed, I'm trying to get rid of gcc.
Comment 14 Michel Dänzer 2016-08-02 09:49:29 UTC
Created attachment 125478 [details] [review]
Make sure the second argument to container_of() is  initialized

Does this patch fix the problem with clang?
Comment 15 Armin K 2016-08-02 10:04:35 UTC
(In reply to Michel Dänzer from comment #14)
> Created attachment 125478 [details] [review] [review]
> Make sure the second argument to container_of() is  initialized
> 
> Does this patch fix the problem with clang?

It appears like its working! Thanks.
Comment 16 Michel Dänzer 2016-08-03 09:38:04 UTC
Should be fixed with

commit b214b05ccd433c484a6a65e491a1a51b19e4811d
Author: Rob Clark <robclark@freedesktop.org>
Date:   Tue Aug 2 16:16:02 2016 -0400

    list: fix an issue with android build using clang

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.