Bug 104681

Summary: Einstein@Home BOINC FGRPB1G GPU app crash
Product: Mesa Reporter: PMouse <porcelain_mouse>
Component: Gallium/StateTracker/CloverAssignee: mesa-dev
Status: RESOLVED MOVED QA Contact: mesa-dev
Severity: normal    
Priority: medium CC: germano.massullo, pavel.ondracka
Version: 17.2   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description PMouse 2018-01-18 01:26:14 UTC
Since Summer 2015, all BOINC projects I've tried with OpenCL support, crash.  The crash has changed over time, but still, it will not run, even after many upgrades throughout the stack: Kernel 4.14, LLVM, AMDGPU driver, and Polaris 10 hardware.

This may be related to bug 98164 and/or bug 104182.  But, neither are exact fits for my symptoms.  Here is the latest crash report:

LATeah0051L_1100.0_0_0.0_18787350_1
Workunit ID:
334687702
Created:
17 Jan 2018 23:52:27 GMT
Sent:
18 Jan 2018 0:29:22 GMT
Report deadline:
1 Feb 2018 0:29:22 GMT
Received:
18 Jan 2018 0:39:50 GMT
Server state:
Over
Outcome:
Computation error
Client state:
Compute error
Exit status:
6 (0x00000006) Unknown error code
Computer:
<redacted>
Run time (sec):
1.32
CPU time (sec):
0.07
Peak working set size (MB):
0
Peak swap size (MB):
0
Peak disk usage (MB):
0.03
Validation state:
Invalid
Granted credit:
0
Application:
Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl1K-ati)
x86_64-pc-linux-gnu
Stderr output

<core_client_version>7.8.0</core_client_version>
<![CDATA[
<message>
process exited with code 6 (0x6, -250)
</message>
<stderr_txt>
16:30:19 (30668): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16

16:30:19 (30668): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'.
16:30:19 (30668): [debug]: 1e+16 fp, 5.3e+09 fp/s, 1981364 s, 550h22m44s26
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0051L.dat --alpha 4.42281478648 --delta -0.0345027837249 --skyRadius 2.152570e-06 --ldiBins 15 --f0start 1092.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 3.344368011e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah0051L_1100_18787350.dat --debug 1 --device 0 -o LATeah0051L_1100.0_0_0.0_18787350_1_0.out
output files: 'LATeah0051L_1100.0_0_0.0_18787350_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah0051L_1100.0_0_0.0_18787350_1_0' 'LATeah0051L_1100.0_0_0.0_18787350_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0051L_1100.0_0_0.0_18787350_1_1'
16:30:19 (30668): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
16:30:19 (30668): [debug]: glibc version/release: 2.26/stable
16:30:19 (30668): [debug]: Set up communication with graphics process.
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [2]
param: 4, val: 0
free(): invalid pointer

-- signal handler called: signal 6
1 stack frames obtained for this thread:
Frame 31:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x48b101)
	Source file: hs_boinc_extras.c (Function: sighandler / Line: 291)
Frame 30:
	Binary file: /lib64/libc.so.6 (0x7f0534b3066b)
	Offset info: gsignal+0xcb
Frame 29:
	Binary file: /lib64/libc.so.6 (0x7f0534b3066b)
	Offset info: gsignal+0xcb
Frame 28:
	Binary file: /lib64/libc.so.6 (0x7f0534b32381)
	Offset info: abort+0x141
Frame 27:
	Binary file: /lib64/libc.so.6 (0x7f0534b7aa57)
	Offset info: +0x81a57
Frame 26:
	Binary file: /lib64/libc.so.6 (0x7f0534b819aa)
	Offset info: +0x889aa
Frame 25:
	Binary file: /lib64/libc.so.6 (0x7f0534b8447c)
	Offset info: +0x8b47c
Frame 24:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x6a7248)
	Offset info: _ZNSt13runtime_errorD2Ev+0x58
	Source file: basic_string.h (Function: &#144;~ / Line: 249)
	Source file: basic_string.h (Function: ~basic_string / Line: 539)
	Source file: stdexcept.cc (Function: &#144;~ / Line: 68)
Frame 23:
	Binary file: /lib64/libMesaOpenCL.so.1 (0x7f0529e109be)
	Offset info: +0x209be
Frame 22:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x6995df)
	Source file: eh_throw.cc (Function:  / Line: 52)
Frame 21:
	Binary file: /lib64/libMesaOpenCL.so.1 (0x7f0529e92a3f)
	Offset info: +0xa2a3f
Frame 20:
	Binary file: /lib64/libMesaOpenCL.so.1 (0x7f0529e3c8e4)
	Offset info: +0x4c8e4
Frame 19:
	Binary file: /lib64/libMesaOpenCL.so.1 (0x7f0529e3c914)
	Offset info: +0x4c914
Frame 18:
	Binary file: /lib64/ld-linux-x86-64.so.2 (0x7f0535884e23)
	Offset info: +0x10e23
Frame 17:
	Binary file: /lib64/ld-linux-x86-64.so.2 (0x7f0535889d7a)
	Offset info: +0x15d7a
Frame 16:
	Binary file: /lib64/libc.so.6 (0x7f0534c54f8f)
	Offset info: _dl_catch_error+0x8f
Frame 15:
	Binary file: /lib64/ld-linux-x86-64.so.2 (0x7f0535889289)
	Offset info: +0x15289
Frame 14:
	Binary file: /lib64/libdl.so.2 (0x7f0535231f96)
	Offset info: +0xf96
Frame 13:
	Binary file: /lib64/libc.so.6 (0x7f0534c54f8f)
	Offset info: _dl_catch_error+0x8f
Frame 12:
	Binary file: /lib64/libdl.so.2 (0x7f0535232715)
	Offset info: +0x1715
Frame 11:
	Binary file: /lib64/libdl.so.2 (0x7f0535232021)
	Offset info: dlopen+0x41
Frame 10:
	Binary file: /lib64/libOpenCL.so.1 (0x7f0535659a82)
	Offset info: +0x5a82
Frame 9:
	Binary file: /lib64/libOpenCL.so.1 (0x7f053565ba74)
	Offset info: clGetPlatformIDs+0x114
Frame 8:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x5babf4)
	Offset info: _Z24boinc_get_opencl_ids_auxPciiPP13_cl_device_idPP15_cl_platform_id+0x74
	Source file: unknown (Function:  / Line: 0)
Frame 7:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x5bb11a)
	Offset info: _Z20boinc_get_opencl_idsPP13_cl_device_idPP15_cl_platform_id+0xe6
	Source file: unknown (Function:  / Line: 0)
Frame 6:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x48bb06)
	Offset info: eah_boinc_get_opencl_ids+0x26
	Source file: hs_boinc_options.cpp (Function: eah_boinc_get_opencl_ids / Line: 136)
Frame 5:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x48db74)
	Offset info: gen_fft_get_ctx+0x44
	Source file: unknown (Function: gen_fft_get_ctx / Line: 0)
Frame 4:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x4795fc)
	Offset info: MAIN+0x15c
	Source file: HSgammaPulsar.c (Function: MAIN / Line: 4230)
Frame 3:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x46c06f)
	Offset info: main+0x5ff
	Source file: hs_boinc_extras.c (Function: worker / Line: 833)
	Source file: hs_boinc_extras.c (Function: main / Line: 1039)
Frame 2:
	Binary file: /lib64/libc.so.6 (0x7f0534b1a00a)
	Offset info: __libc_start_main+0xea
Frame 1:
	Binary file: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati (0x46e569)
	Source file: unknown (Function: _start / Line: 0)

End of stcaktrace
16:30:19 (30668): called boinc_finish

</stderr_txt>
]]>
Comment 1 Pavel Ondračka 2018-01-18 11:54:52 UTC
The backtrace is missing debug symbols, please install the missing symbols, rerun the app directly under gdb and attach full backtrace. BTW gdb will print you the command needed to install the missing symbols when you ran the app with it.
You may also try running it under valgrind, it could give some clue where that invalid free comes from, and attach the output. I cannot test this ATM as I have only pre-GCN hardware around, which has different problems (https://bugs.llvm.org/show_bug.cgi?id=35910).

In fact the crashes in seti and milkyway apps are probably unrelated, hence it is better to limit it to one app per bug. Lets leave this bug about the einstein@home binary pulsar search (FGRPB1G) app. BTW the einstein@home FGRP5 GPU app was working with clover last time a checked, although I did not get any GPU work for this app in a long time.

Changing the component as this is definitely not a vulcan problem, probably clover -> mesa core. Also refining the title since it currently implies this is a regression, while the mentioned apps probably never worked.
Comment 2 Adam Jackson 2019-09-18 18:02:32 UTC
For some reason I don't have the time to investigate, the bugzilla->gitlab migration script doesn't like this bug. I'm closing this, please open a new issue in gitlab if this is still a problem:

https://gitlab.freedesktop.org/mesa/mesa/

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.