Bug 110261

Summary:	Segmentation fault when using vulkaninfo on Radeon
Product:	Mesa	Reporter:	Kenneth Endfinger <kaendfinger>
Component:	Drivers/Vulkan/Common	Assignee:	mesa-dev
Status:	RESOLVED NOTOURBUG	QA Contact:
Severity:	normal
Priority:	medium	CC:	airlied, chadversary, daniel, danylo.piliaiev, denys.kostin, jason
Version:	19.0
Hardware:	x86-64 (AMD64)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	GDB session when running vulkaninfo.

Description Kenneth Endfinger 2019-03-28 03:32:51 UTC

Created attachment 143799 [details]
GDB session when running vulkaninfo.

When running vulkaninfo, it exits with a segmentation fault.

Running on Arch Linux:
mesa 19.0.1-2
vulkan-radeon 19.0.1-2
libxcb 1.13.1-1
xorg-server 1.20.4-1
vulkan-tools 1.1.101-1

Thread 1 "vulkaninfo" received signal SIGSEGV, Segmentation fault.
0x00007ffff543a0f7 in XGetXCBConnection () from /usr/lib/libX11-xcb.so.1
(gdb) info stack
#0  0x00007ffff543a0f7 in XGetXCBConnection () from /usr/lib/libX11-xcb.so.1
#1  0x00007ffff4ca58d1 in x11_surface_get_connection (icd_surface=0x555555972610) at ../mesa-19.0.1/src/vulkan/wsi/wsi_common_x11.c:404
#2  x11_surface_get_connection (icd_surface=0x555555972610) at ../mesa-19.0.1/src/vulkan/wsi/wsi_common_x11.c:401

I have attached the full GDB log.

Comment 1 Samuel Pitoiset 2019-03-28 15:43:06 UTC

I can't reproduce but it crashes inside the WSI code path.

Comment 2 Kenneth Endfinger 2019-03-28 16:07:33 UTC

I am also running an AMD eGPU over ThunderBolt:

Section "Device"
  Identifier "AMD"
  Driver "amdgpu"
  BusID "PCI:61:0:0"
  Option "AllowEmptyInitialConfiguration"
  Option "AllowExternalGpus"
EndSection

Section "Device"
  Identifier "Intel"
  Driver "intel"
  BusID "PCI:0:2:0"
EndSection

kendfinger@melt ~ $ sudo lspci | grep "VGA"
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)
01:00.0 VGA compatible controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev a1)
3d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev ef)

kendfinger@melt ~ $ xrandr --listproviders 
Providers: number : 2
Provider 0: id: 0xc1 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 6 outputs: 5 associated providers: 1 name:Radeon RX 570 Series @ pci:0000:3d:00.0
Provider 1: id: 0x45 cap: 0xb, Source Output, Sink Output, Sink Offload crtcs: 4 outputs: 2 associated providers: 1 name:Intel

Comment 3 Denis 2019-03-29 10:24:27 UTC

hi, I am able to reproduce this issue on Manjaro OS and intel (CFL CPU), with system (18.3.4) and built from git mesa versions.

Can I provide some additional information for you to help in debugging?

Core was generated by `vulkaninfo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa3451b50f7 in XGetXCBConnection () from /usr/lib/libX11-xcb.so.1
[Current thread is 1 (Thread 0x7fa34945c740 (LWP 7380))]
(gdb) bt
#0  0x00007fa3451b50f7 in XGetXCBConnection () from /usr/lib/libX11-xcb.so.1
#1  0x00007fa342a267c1 in ?? () from /usr/lib/libvulkan_intel.so
#2  0x0000561d0b6e0693 in ?? ()
#3  0x0000561d0b6d5f72 in ?? ()
#4  0x00007fa349852223 in __libc_start_main () from /usr/lib/libc.so.6
#5  0x0000561d0b6d67be in ?? ()

that's my core dump.

Comment 4 Denis 2019-04-05 08:58:27 UTC

looks like I made a bisect for this issue. Jason, could you please take a look to it? It shows your commit.



[manjaro@manjaro-pc mesa]$ git bisect good

68df93ecbcee6215ac49e0c6f62ae818d2bc9962 is the first bad commit

commit 68df93ecbcee6215ac49e0c6f62ae818d2bc9962

Author: Jason Ekstrand <jason.ekstrand@intel.com>

Date:   Thu Sep 21 13:54:55 2017 -0700

 

    anv: Trivially implement VK_KHR_device_group

   

    Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

 

:040000 040000 a48855544644df9cb612163b98c96ac3b53b78d4 98add0f3169c5897d9e47116565e992813312109 M    src

Comment 5 Jason Ekstrand 2019-04-05 23:03:16 UTC

Can you reproduce with a full debug build in gdb and run "bt all"?

Comment 6 Kenneth Endfinger 2019-04-08 04:23:56 UTC

Is this what you are looking for?

(gdb) thread apply all backtrace

Thread 2 (Thread 0x7ffff38f8700 (LWP 8010)):
#0  0x00007ffff7ba7afc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007ffff5669e24 in cnd_wait (mtx=0x5555556f8208, cond=0x5555556f8230) at ../mesa-19.0.1/src/../include/c11/threads_posix.h:155
#2  util_queue_thread_func (input=input@entry=0x555555723f70) at ../mesa-19.0.1/src/util/u_queue.c:270
#3  0x00007ffff5669b48 in impl_thrd_routine (p=<optimized out>) at ../mesa-19.0.1/src/../include/c11/threads_posix.h:87
#4  0x00007ffff7ba1a9d in start_thread () from /usr/lib/libpthread.so.0
#5  0x00007ffff7cdfb23 in clone () from /usr/lib/libc.so.6

Thread 1 (Thread 0x7ffff7a10740 (LWP 8004)):
#0  0x00007ffff54300f7 in XGetXCBConnection () from /usr/lib/libX11-xcb.so.1
#1  0x00007ffff565c961 in x11_surface_get_connection (icd_surface=0x555555a19510) at ../mesa-19.0.1/src/vulkan/wsi/wsi_common_x11.c:404
#2  x11_surface_get_connection (icd_surface=0x555555a19510) at ../mesa-19.0.1/src/vulkan/wsi/wsi_common_x11.c:401
#3  x11_surface_get_support (icd_surface=0x555555a19510, wsi_device=0x5555557428f0, queueFamilyIndex=<optimized out>, pSupported=0x7fffffffdef4) at ../mesa-19.0.1/src/vulkan/wsi/wsi_common_x11.c:424
#4  0x00005555555626e3 in AppGpuDumpQueueProps (out=0x7ffff7da35c0 <_IO_2_1_stdout_>, id=0, gpu=0x555555a1b190) at /usr/src/debug/Vulkan-Tools-1.1.101/vulkaninfo/vulkaninfo.c:4461
#5  AppGpuDump (gpu=0x555555a1b190, out=0x7ffff7da35c0 <_IO_2_1_stdout_>) at /usr/src/debug/Vulkan-Tools-1.1.101/vulkaninfo/vulkaninfo.c:4764
#6  0x0000555555557f24 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/Vulkan-Tools-1.1.101/vulkaninfo/vulkaninfo.c:5328

Comment 7 Kenneth Endfinger 2019-04-08 04:26:19 UTC

Oddly enough, with vulkan-tools 1.1.102:

/build/vulkan-tools/src/Vulkan-Tools-1.1.102/vulkaninfo/vulkaninfo.c:4504: failed with VK_ERROR_OUT_OF_HOST_MEMORY

is now the error...

Comment 8 Denis 2019-04-08 10:03:29 UTC

>/build/vulkan-tools/src/Vulkan-Tools-1.1.102/vulkaninfo/vulkaninfo.c:4504: failed with VK_ERROR_OUT_OF_HOST_MEMORY

Actually that's exactly what I got in "normal" version. I discussed this error with dev and he said that it couldn't be very critical. So during bisection my "good" result - was that error, and "bad" - sigfault

>Can you reproduce with a full debug build in gdb and run "bt all"?
sorry for long response. Is this still actual or Kenneth gave needed log?

Comment 9 Denis 2019-04-08 10:32:43 UTC

bt all didn't provide anything, so I did same with Kenneth. Output below:


(gdb) bt all
No symbol "all" in current context.
(gdb) thread apply all backtrace

Thread 2 (Thread 0x7ffff6c37700 (LWP 11457)):
#0  0x00007ffff7bc2afc in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007ffff771d5c7 in cnd_wait (cond=0x55555584be30, mtx=0x55555584be08) at ../include/c11/threads_posix.h:155
#2  0x00007ffff771e0a6 in util_queue_thread_func (input=0x55555557ffd0) at ../src/util/u_queue.c:272
#3  0x00007ffff771d3f8 in impl_thrd_routine (p=0x555555658040) at ../include/c11/threads_posix.h:87
#4  0x00007ffff7bbca9d in start_thread () from /usr/lib/libpthread.so.0
#5  0x00007ffff7cfab23 in clone () from /usr/lib/libc.so.6

Thread 1 (Thread 0x7ffff7a2b740 (LWP 11453)):
#0  0x00007ffff7f780ce in xcb_send_request_with_fds64 () from /usr/lib/libxcb.so.1
#1  0x00007ffff7f7866a in xcb_send_request () from /usr/lib/libxcb.so.1
#2  0x00007ffff7f87405 in xcb_query_extension () from /usr/lib/libxcb.so.1
#3  0x00007ffff770a3ce in wsi_x11_connection_create (wsi_dev=0x555555600510, conn=0xa0ec8148e5894855)
    at ../src/vulkan/wsi/wsi_common_x11.c:135
#4  0x00007ffff770a782 in wsi_x11_get_connection (wsi_dev=0x555555600510, conn=0xa0ec8148e5894855)
    at ../src/vulkan/wsi/wsi_common_x11.c:242
#5  0x00007ffff770ac90 in x11_surface_get_support (icd_surface=0x55555584d3a0, wsi_device=0x555555600510, queueFamilyIndex=0, 
    pSupported=0x7fffffffd8d4) at ../src/vulkan/wsi/wsi_common_x11.c:428
#6  0x00007ffff7709086 in wsi_common_get_surface_support (wsi_device=0x555555600510, queueFamilyIndex=0, _surface=0x55555584d3a0, 
    pSupported=0x7fffffffd8d4) at ../src/vulkan/wsi/wsi_common.c:724
#7  0x00007ffff746074e in anv_GetPhysicalDeviceSurfaceSupportKHR (physicalDevice=0x555555600070, queueFamilyIndex=0, 
    surface=0x55555584d3a0, pSupported=0x7fffffffd8d4) at ../src/intel/vulkan/anv_wsi.c:91
#8  0x0000555555562693 in ?? ()
#9  0x0000555555557f72 in ?? ()
#10 0x00007ffff7c23223 in __libc_start_main () from /usr/lib/libc.so.6
#11 0x00005555555587be in ?? ()

Comment 10 Jason Ekstrand 2019-04-08 11:22:00 UTC

Unfortunately, thanks fo Vulkan passing everything as struct pointers, a backtrack with `bt full` isn't as useful as one would like.  That said, given where it's crashing, I'm 93% sure that both of those backtraces are due to the client (vulkaninfo) providing us with a bogus X11 connection/display.

Comment 11 algebro 2019-04-19 14:52:45 UTC

I believe I'm running into a related (or the same) error:

Arch Linux with an AMD RX580 GPU
Mesa 19.0.2-1
vulkan-radeon 19.0.2-1
libxcb 1.13.1-1
xorg-server 1.20.4-1
vulkan-tools 1.1.102-1

Process 3567 (vulkaninfo) of user 1000 dumped core.
                                             
Stack trace of thread 3567:
#0  0x00006efde23374d1 xcb_send_request_with_fds64 (libxcb.so.1)
#1  0x00006efde233766a xcb_send_request (libxcb.so.1)
#2  0x00006efde2346405 xcb_query_extension (libxcb.so.1)
#3  0x00006efde1c273ed n/a (libvulkan_radeon.so)
#4  0x00006efde1c27bca n/a (libvulkan_radeon.so)
#5  0x00000b7f648c572d n/a (vulkaninfo)
#6  0x00000b7f648baf92 n/a (vulkaninfo)
#7  0x00006efde1fe2223 __libc_start_main (libc.so.6)
#8  0x00000b7f648bb7de n/a (vulkaninfo)
                                             
Stack trace of thread 3568:
#0  0x00006efde1f81afc pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1  0x00006efde1c33874 n/a (libvulkan_radeon.so)
#2  0x00006efde1c33598 n/a (libvulkan_radeon.so)
#3  0x00006efde1f7ba9d start_thread (libpthread.so.0)
#4  0x00006efde20b9af3 __clone (libc.so.6)

Let me know if you think this is related or if I should open another bug report.

Comment 12 Denis 2019-06-14 15:58:50 UTC

Jason you was absolutely right.

Test data:
GPU Intel HD620
Manjaro OS


vulkan-tools 1.1.101-1 - issue reproducable
vulkan-tools 1.1.106-1 - issue is not reproducable

Can somebody check this on radeon? Or I will try to do this later

Comment 13 Danylo 2019-06-14 16:15:05 UTC

Yes, now on radeon it doesn't crash with vulkan-tools 1.1.106

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.