Bug 102432

Summary: [regression] Steam fails to start with libdrm 2.4.83
Product: DRI Reporter: Gregor Münch <gr.muench>
Component: libdrmAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: medium    
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Gregor Münch 2017-08-27 15:42:06 UTC
After update, nothing happens, console log:

*** Error in `/home/greg/.local/share/Steam/ubuntu12_32/steam': realloc(): invalid next size: 0x585f5050 ***
ILocalize::AddFile() failed to load file "public/steambootstrapper_english.txt".

libdrm 2.4.82 fixes this.

Radeon HD 7970 with enabled amdgpu instead of radeon.
Comment 1 Gregor Münch 2017-08-27 17:04:21 UTC
Forget everything, solution:

Ok, the problem is that nobody updated lib32-libdrm in arch. Its still at 2.4.81 so way to old. I installed now the git versions of both packages witch brings them effectively to 2.4.83 r0 and the problem went away.

In the meantime, I even tried to bisect:

0167e6836e91947418fec36c3b4b396760d0f345 is the first bad commit
commit 0167e6836e91947418fec36c3b4b396760d0f345
Author: Jan Vesely <jan.vesely@rutgers.edu>
Date:   Fri Jul 28 01:46:45 2017 -0400

    amdgpu: Add FX-9800P Bristol Ridge iGPU id
    
    Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
    Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

:040000 040000 8b07e8bdce21260ecccb07f424348b838a787472 2c65bba8b27ddf5329fa9f1e3501f33225ca909a M	data

confirmed by reverting, though its pointless.
Comment 2 Jan Vesely 2017-08-31 17:05:42 UTC
(In reply to Gregor Münch from comment #1)
> Forget everything, solution:
> 
> Ok, the problem is that nobody updated lib32-libdrm in arch. Its still at
> 2.4.81 so way to old. I installed now the git versions of both packages
> witch brings them effectively to 2.4.83 r0 and the problem went away.
> 
> In the meantime, I even tried to bisect:
> 
> 0167e6836e91947418fec36c3b4b396760d0f345 is the first bad commit
> commit 0167e6836e91947418fec36c3b4b396760d0f345
> Author: Jan Vesely <jan.vesely@rutgers.edu>
> Date:   Fri Jul 28 01:46:45 2017 -0400
> 
>     amdgpu: Add FX-9800P Bristol Ridge iGPU id
>     
>     Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
>     Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
> 
> :040000 040000 8b07e8bdce21260ecccb07f424348b838a787472
> 2c65bba8b27ddf5329fa9f1e3501f33225ca909a M	data
> 
> confirmed by reverting, though its pointless.

the list of ids needs to be sinced across 32/64 bit versions.
The parsing mechanism is flimsy and adding an entry to /usr/share/libdrm/amdgpu.ids crashes libdrm
Comment 3 Michel Dänzer 2017-09-01 01:08:41 UTC
(In reply to Jan Vesely from comment #2)
> The parsing mechanism is flimsy and adding an entry to
> /usr/share/libdrm/amdgpu.ids crashes libdrm

Can you share more details about how adding en entry to /usr/share/libdrm/amdgpu.ids causes a crash?
Comment 4 Jan Vesely 2017-09-01 18:44:01 UTC
(In reply to Michel Dänzer from comment #3)
> (In reply to Jan Vesely from comment #2)
> > The parsing mechanism is flimsy and adding an entry to
> > /usr/share/libdrm/amdgpu.ids crashes libdrm
> 
> Can you share more details about how adding en entry to
> /usr/share/libdrm/amdgpu.ids causes a crash?

using libdrm-2.4.82-1.fc26.x86_64 (there have been no other changes in amdgpu_asic_id.c in later releases)

sudo vim /usr/share/libdrm/amdgpu.ids
add a line like the one in the above mentioned commit

glxinfo -> crash
clinfo -> crash
glxgears -> crash

looking at the code adding one entry means that table_size == table_max_size + 1. therefore all initially callocated (line 132) memory is used.
memset(line 191) then writes beyond the allocated memory and corrupts libc internal structures. the following realloc(line 194) then crashes.
see below for both valgrind trace showing the invalid write, and gdb trace showing the crash.

moving the memset(line 191) after the last realloc block should be enough to fix the problem.
I have verified that adding 2 entries to /usr/share/libdrm/amdgpu.ids is a workaround, realloc on line 159 triggers and prevents illegal write from memset(line 191).


clinfo backtrace from gdb (the machine is accessed remotely, but the crashes happen even if used locally):
Program received signal SIGABRT, Aborted.
0x00007ffff761769b in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff761769b in raise () from /lib64/libc.so.6
#1  0x00007ffff76194a0 in abort () from /lib64/libc.so.6
#2  0x00007ffff765d8e1 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff766bd19 in _int_realloc () from /lib64/libc.so.6
#4  0x00007ffff766e7eb in realloc () from /lib64/libc.so.6
#5  0x00007fffedcd120c in amdgpu_parse_asic_ids (p_asic_id_table=p_asic_id_table@entry=0x5555557c9ff8) at amdgpu_asic_id.c:194
#6  0x00007fffedcd386c in amdgpu_device_initialize (fd=fd@entry=4, major_version=major_version@entry=0x7fffffffd5c4, minor_version=minor_version@entry=0x7fffffffd5c8, 
    device_handle=device_handle@entry=0x7fffffffd5e0) at amdgpu_device.c:276
#7  0x00007fffee1b3adf in amdgpu_winsys_create (fd=fd@entry=4, screen_create=0x7fffee2054d0 <radeonsi_screen_create>) at amdgpu_winsys.c:562
#8  0x00007fffee101d8f in create_screen (fd=4) at pipe_radeonsi.c:14
#9  0x00007ffff7296348 in clover::device::device(clover::platform&, pipe_loader_device*) () from /lib64/libMesaOpenCL.so.1
#10 0x00007ffff72bb68b in clover::intrusive_ref<clover::device> clover::create<clover::device, clover::platform&, pipe_loader_device*&>(clover::platform&, pipe_loader_device*&) ()
   from /lib64/libMesaOpenCL.so.1
#11 0x00007ffff72bb2d9 in clover::platform::platform() () from /lib64/libMesaOpenCL.so.1
#12 0x00007ffff7265248 in __static_initialization_and_destruction_0 () from /lib64/libMesaOpenCL.so.1
#13 0x00007ffff7265278 in _GLOBAL__sub_I_platform.cpp () from /lib64/libMesaOpenCL.so.1
#14 0x00007ffff7de6d73 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#15 0x00007ffff7debcca in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#16 0x00007ffff7732a2f in _dl_catch_error () from /lib64/libc.so.6
#17 0x00007ffff7deb1d9 in _dl_open () from /lib64/ld-linux-x86-64.so.2
#18 0x00007ffff79b2f26 in dlopen_doit () from /lib64/libdl.so.2
#19 0x00007ffff7732a2f in _dl_catch_error () from /lib64/libc.so.6
#20 0x00007ffff79b36a5 in _dlerror_run () from /lib64/libdl.so.2
#21 0x00007ffff79b2fb1 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#22 0x00007ffff7bbb982 in _load_icd (lib_path=0x5555557734d0 "libMesaOpenCL.so.1", num_icds=0) at ocl_icd_loader.c:184
#23 _open_driver (file_path=<optimized out>, dir_path=<optimized out>, num_icds=0) at ocl_icd_loader.c:237
#24 _open_drivers (dir_path=<optimized out>, dir=<optimized out>) at ocl_icd_loader.c:250
#25 __initClIcd () at ocl_icd_loader.c:646
#26 _initClIcd_real () at ocl_icd_loader.c:702
#27 0x00007ffff7bbd994 in _initClIcd () at ocl_icd_loader.c:724
#28 clGetPlatformIDs (num_entries=0, platforms=0x0, num_platforms=0x55555576a798 <num_platforms>) at ocl_icd_loader.c:846
#29 0x00005555555596c2 in main (argc=1, argv=0x7fffffffe108) at src/clinfo.c:2723

running in valgrind prevents the crash but it complains about invalid write in that function:
==3701== Command: clinfo
==3701== 
==3701== Invalid write of size 8
==3701==    at 0xF13A1DE: UnknownInlinedFun (string3.h:90)
==3701==    by 0xF13A1DE: amdgpu_parse_asic_ids (amdgpu_asic_id.c:191)
==3701==    by 0xF13C86B: amdgpu_device_initialize (amdgpu_device.c:276)
==3701==    by 0xEBF6ADE: amdgpu_winsys_create (amdgpu_winsys.c:562)
==3701==    by 0xEB44D8E: create_screen (pipe_radeonsi.c:14)
==3701==    by 0x5AAD347: clover::device::device(clover::platform&, pipe_loader_device*) (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x5AD268A: clover::intrusive_ref<clover::device> clover::create<clover::device, clover::platform&, pipe_loader_device*&>(clover::platform&, pipe_loader_device*&) (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x5AD22D8: clover::platform::platform() (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x5A7C247: __static_initialization_and_destruction_0(int, int) (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x5A7C277: _GLOBAL__sub_I_platform.cpp (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x4010D72: _dl_init (in /usr/lib64/ld-2.25.so)
==3701==    by 0x4015CC9: dl_open_worker (in /usr/lib64/ld-2.25.so)
==3701==    by 0x53B0A2E: _dl_catch_error (in /usr/lib64/libc-2.25.so)
==3701==  Address 0x57c72b0 is 0 bytes after a block of size 2,464 alloc'd
==3701==    at 0x4C30A1E: calloc (vg_replace_malloc.c:711)
==3701==    by 0xF139E92: amdgpu_parse_asic_ids (amdgpu_asic_id.c:132)
==3701==    by 0xF13C86B: amdgpu_device_initialize (amdgpu_device.c:276)
==3701==    by 0xEBF6ADE: amdgpu_winsys_create (amdgpu_winsys.c:562)
==3701==    by 0xEB44D8E: create_screen (pipe_radeonsi.c:14)
==3701==    by 0x5AAD347: clover::device::device(clover::platform&, pipe_loader_device*) (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x5AD268A: clover::intrusive_ref<clover::device> clover::create<clover::device, clover::platform&, pipe_loader_device*&>(clover::platform&, pipe_loader_device*&) (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x5AD22D8: clover::platform::platform() (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x5A7C247: __static_initialization_and_destruction_0(int, int) (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x5A7C277: _GLOBAL__sub_I_platform.cpp (in /usr/lib64/libMesaOpenCL.so.1.0.0)
==3701==    by 0x4010D72: _dl_init (in /usr/lib64/ld-2.25.so)
==3701==    by 0x4015CC9: dl_open_worker (in /usr/lib64/ld-2.25.so)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.