Summary: | r300_dri.so SIGSEGV in llvm_pipeline_generic under Cinnamon | ||
---|---|---|---|
Product: | Mesa | Reporter: | Anthony Ciani <anthonyjciani> |
Component: | Drivers/Gallium/r300 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | major | ||
Priority: | medium | CC: | maraeo, michael.panzlaff |
Version: | 18.0 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
registers, stack and jit function disassembly
patch Crash with GALLIVM_DEBUG=tgsi,ir,asm |
Description
Anthony Ciani
2018-05-16 01:39:19 UTC
Error occurs on two identical Dell Inspiron 1501's, one with 3GB of memory and the other with 4GB. 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD/ATI] RS480/RS482/RS485 Host Bridge [1002:5950] (rev 10) 00:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Bridge [int gfx] [1002:5a3f] 00:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Express Port 2 [1002:5a37] 00:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Express Port 3 [1002:5a38] 00:12.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD/ATI] SB600 Non-Raid-5 SATA [1002:4380] 00:13.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI0) [1002:4387] 00:13.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI1) [1002:4388] 00:13.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI2) [1002:4389] 00:13.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI3) [1002:438a] 00:13.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI4) [1002:438b] 00:13.5 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB Controller (EHCI) [1002:4386] 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller [1002:4385] (rev 14) 00:14.1 IDE interface [0101]: Advanced Micro Devices, Inc. [AMD/ATI] SB600 IDE [1002:438c] 00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) [1002:4383] 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD/ATI] SB600 PCI to LPC Bridge [1002:438d] 00:14.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge [1002:4384] 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration [1022:1100] 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Address Map [1022:1101] 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] DRAM Controller [1022:1102] 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Miscellaneous Control [1022:1103] 01:05.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] RS482M [Mobility Radeon Xpress 200] [1002:5975] 05:00.0 Network controller [0280]: Broadcom Limited BCM4311 802.11b/g WLAN [14e4:4311] (rev 01) 08:00.0 Ethernet controller [0200]: Broadcom Limited BCM4401-B0 100Base-TX [14e4:170c] (rev 02) 08:01.0 SD Host controller [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter [1180:0822] (rev 19) 08:01.1 System peripheral [0880]: Ricoh Co Ltd R5C843 MMC Host Controller [1180:0843] (rev 01) I suppose it crashes in the jit-compiled code (debug symbols should help with identifying that, but not help any further if that's the case). Can you print out the faulting instruction (gdb x/i address or so)? Also, if that would be some SSE instruction, try to see if the memory operand is aligned (or just not addressable). Created attachment 139995 [details] registers, stack and jit function disassembly (In reply to Roland Scheidegger from comment #2) > I suppose it crashes in the jit-compiled code (debug symbols should help > with identifying that, but not help any further if that's the case). > Can you print out the faulting instruction (gdb x/i address or so)? > Also, if that would be some SSE instruction, try to see if the memory > operand is aligned (or just not addressable). I've tried to get the game "Thimbleweed Park" to run and it seems like it's crashing for a very similar reason. This is the stack trace: #0 0x00007ffff7fdc000 in ?? () #1 0x00007ffff27288d8 in llvm_pipeline_generic (middle=middle@entry=0x2bc9110, fetch_info=fetch_info@entry=0x7fffffffdf80, in_prim_info=in_prim_info@entry=0x7fffffffdfa0) at draw/draw_pt_fetch_shade_pipeline_llvm.c:408 #2 0x00007ffff2728f86 in llvm_middle_end_linear_run (middle=0x2bc9110, start=0, count=<optimized out>, prim_flags=0) at draw/draw_pt_fetch_shade_pipeline_llvm.c:588 #3 0x00007ffff2635d56 in vsplit_segment_simple_linear (vsplit=0x2bc6340, vsplit=0x2bc6340, icount=4, istart=0, flags=0) at draw/draw_pt_vsplit_tmp.h:226 #4 vsplit_run_linear (frontend=0x2bc6340, start=0, count=4) at draw/draw_split_tmp.h:70 #5 0x00007ffff262d71a in draw_pt_arrays (draw=draw@entry=0x2ba3b20, prim=6, start=0, count=count@entry=4) at draw/draw_pt.c:175 #6 0x00007ffff262df50 in draw_vbo (draw=0x2ba3b20, info=0x7fffffffe0d0, info@entry=0x7fffffffe1a0) at draw/draw_pt.c:609 #7 0x00007ffff273b319 in r300_swtcl_draw_vbo (pipe=0x2b7ac80, info=0x7fffffffe1a0) at r300_render.c:862 #8 0x00007ffff273d9e6 in r300_stencilref_draw_vbo (pipe=0x2b7ac80, info=0x7fffffffe1a0) at r300_render_stencilref.c:113 #9 0x00007ffff261cce7 in cso_draw_arrays (cso=<optimized out>, mode=mode@entry=6, start=start@entry=0, count=count@entry=4) at cso_cache/cso_context.c:1724 #10 0x00007ffff2413ee4 in st_draw_quad (st=st@entry=0x2cbddb0, x0=x0@entry=-1, y0=y0@entry=-0.899999976, x1=x1@entry=1, y1=y1@entry=0.899999976, z=1, s0=s0@entry=0, t0=t0@entry=0, s1=s1@entry=0, t1=0, color=color@entry=0x2c9a44c, num_instances=num_instances@entry=1) at state_tracker/st_draw.c:435 #11 0x00007ffff23f8df1 in clear_with_quad (clear_buffers=<optimized out>, ctx=0x2c987c0) at state_tracker/st_cb_clear.c:300 #12 st_Clear (ctx=0x2c987c0, mask=2) at state_tracker/st_cb_clear.c:454 #13 0x00007ffff2244dc5 in clear (no_error=false, mask=<optimized out>, ctx=0x2c987c0) at main/clear.c:221 #14 _mesa_Clear (mask=<optimized out>) at main/clear.c:244 #15 0x000000000049e364 in ?? () #16 0x0000000000481fd3 in ?? () #17 0x000000000048359f in ?? () #18 0x00007ffff6c9aa87 in __libc_start_main (main=0x40e130, argc=1, argv=0x7fffffffe5f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe5e8) at ../csu/libc-start.c:310 #19 0x000000000040f04a in ?? () To answer your original question I've provided information (stack, registers, assembly of jit function) in the attached text file. PS: Hope I did everything correct. First time for me posting something on the bugzilla. You have the same tcl-less chipset? So the assembly shows the segfault isn't due to alignment, but simply because it's a null pointer. This is also mesa 18.0? I'm just asking because according to the line numbers (r300_render_stencilref.c:113) it's doing two-sided stencil emulation, which I can't see how it could happen with the emulated clear (which definitely doesn't enable two-sided stencil), so that looks a little bit suspect. Although maybe the line numbers aren't quite accurate due to optimization... You could try setting (with a debug build) DRAW_USE_LLVM="0" and see if this fixes the crash - and if it still crashes it should be easier to figure out what pointer is zero. You could print out the shader (with a debug build) with GALLIVM_DEBUG=tgsi,ir,asm to see what the assembly really might do. I think though this is trying to read a vertex buffer (for position probably) which just isn't there. Or you could try setting (with a debug build) DRAW_USE_LLVM="0" and see if this fixes the crash - and if it still crashes it should be easier to figure out what pointer is zero. I think I have an idea why it might fail, but someone more familiar with the u_upload stuff and r300 would have to look at it: r300_set_vertex_buffers_swtcl() would provide the vertex buffers to draw, and I suspect it's always setting NULL buffer, because I think it's going to be not a user buffer, but there won't be a malloced_buffer neither - the u_upload code will call r300_buffer_create() but it won't alloc the malloced_buffer because the PIPE_BIND_CUSTOM bit will always be set (because r300->stream_uploader is the same as r300->uploader which will set that by default). I'm not quite sure actually how it's supposed to work, maybe the logic in r300_buffer_create should always use the malloced_buffer path when !has_tcl? Created attachment 140026 [details] [review] patch Would you please test the attached patch? Thanks. (In reply to Roland Scheidegger from comment #4) > You have the same tcl-less chipset? I'm sorry to say, but I don't really know what a tcl-less chipset is. I'm using a Dell Latitude D531. Perhaps an lspci output helps? 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RS690 Host Bridge 00:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RS690 PCI to PCI Bridge (Internal gfx) 00:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RS690 PCI to PCI Bridge (PCI Express Port 1) 00:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RS690 PCI to PCI Bridge (PCI Express Port 2) 00:12.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB600 Non-Raid-5 SATA 00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI0) 00:13.1 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI1) 00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI2) 00:13.3 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI3) 00:13.4 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI4) 00:13.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB Controller (EHCI) 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 14) 00:14.1 IDE interface: Advanced Micro Devices, Inc. [AMD/ATI] SB600 IDE 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB600 PCI to LPC Bridge 00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RS690M [Radeon Xpress 1200/1250/1270] 03:01.0 CardBus bridge: O2 Micro, Inc. Cardbus bridge (rev 21) 03:01.4 FireWire (IEEE 1394): O2 Micro, Inc. Firewire (IEEE 1394) (rev 02) 09:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5755M Gigabit Ethernet PCI Express (rev 02) 0b:00.0 Network controller: Broadcom Limited BCM4311 802.11a/b/g (rev 01) > This is also mesa 18.0? I'm just asking because according to the line > numbers (r300_render_stencilref.c:113) it's doing two-sided stencil > emulation, which I can't see how it could happen with the emulated clear > (which definitely doesn't enable two-sided stencil), so that looks a little > bit suspect. Although maybe the line numbers aren't quite accurate due to > optimization... I initially used mesa 18.0.? from Debian Unstable but my latest tests with the stacktrace we're made using mesa from the git repository (master). I sadly can't really help with the stencil stuff. I know a bit of OpenGL but I literally have no idea about the implementation behind it. > You could try setting (with a debug build) DRAW_USE_LLVM="0" and see if this > fixes the crash - and if it still crashes it should be easier to figure out > what pointer is zero. Still segfaults. Here is a stacktrace: #0 util_format_r32g32b32_float_fetch_rgba_float (dst=0x7fffffffde70, src=0x0, i=0, j=0) at util/u_format_table.c:10080 #1 0x00007ffff24a9032 in generic_run_one (vert=0x4521aa0, instance_id=0, start_instance=0, elt=0, tg=<optimized out>) at translate/translate_generic.c:629 #2 generic_run (translate=0x49c8bc0, start=<optimized out>, count=<optimized out>, start_instance=0, instance_id=0, output_buffer=<optimized out>) at translate/translate_generic.c:723 #3 0x00007ffff244e040 in fetch (output=<optimized out>, fetch_info=<optimized out>, fetch= {void (char *, const struct draw_fetch_info *, struct pt_fetch *)} 0x7ffff244dec3 <fetch_pipeline_generic+131>) at draw/draw_pt_fetch_shade_pipeline.c:161 #4 fetch_pipeline_generic (middle=0x2bb8f60, fetch_info=fetch_info@entry=0x7fffffffe010, in_prim_info=in_prim_info@entry=0x7fffffffe030) at draw/draw_pt_fetch_shade_pipeline.c:268 #5 0x00007ffff244e48d in fetch_pipeline_linear_run (middle=<optimized out>, start=<optimized out>, count=<optimized out>, prim_flags=<optimized out>) at draw/draw_pt_fetch_shade_pipeline.c:426 #6 0x00007ffff245368a in vsplit_segment_simple_linear (vsplit=0x2bb63a0, vsplit=0x2bb63a0, icount=4, istart=0, flags=0) at draw/draw_pt_vsplit_tmp.h:226 #7 vsplit_run_linear (frontend=0x2bb63a0, start=0, count=4) at draw/draw_split_tmp.h:60 #8 0x00007ffff244b4a8 in draw_pt_arrays (draw=draw@entry=0x2ba3980, prim=6, start=0, count=count@entry=4) at draw/draw_pt.c:149 #9 0x00007ffff244b9c4 in draw_vbo (draw=0x2ba3980, info=0x7fffffffe150, info@entry=0x7fffffffe1f0) at draw/draw_pt.c:564 #10 0x00007ffff257a5c9 in r300_swtcl_draw_vbo (pipe=0x2b7aae0, info=0x7fffffffe1f0) at r300_render.c:862 #11 0x00007ffff2439de7 in cso_draw_arrays (cso=<optimized out>, mode=mode@entry=6, start=start@entry=0, count=count@entry=4) at cso_cache/cso_context.c:1724 #12 0x00007ffff222ec59 in st_draw_quad (st=st@entry=0x2d3a290, x0=x0@entry=-1, y0=y0@entry=-0.899999976, x1=x1@entry=1, y1=y1@entry=0.899999976, z=1, s0=s0@entry=0, t0=t0@entry=0, s1=s1@entry=0, t1=0, color=color@entry=0x2d1692c, num_instances=num_instances@entry=1) at state_tracker/st_draw.c:428 #13 0x00007ffff2213c39 in clear_with_quad (clear_buffers=<optimized out>, ctx=0x2d14ca0) at state_tracker/st_cb_clear.c:293 #14 st_Clear (ctx=0x2d14ca0, mask=2) at state_tracker/st_cb_clear.c:445 #15 0x000000000049e364 in ?? () #16 0x0000000000481fd3 in ?? () #17 0x000000000048359f in ?? () #18 0x00007ffff6c9aa87 in __libc_start_main (main=0x40e130, argc=1, argv=0x7fffffffe5f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe5e8) at ../csu/libc-start.c:310 #19 0x000000000040f04a in ?? () > You could print out the shader (with a debug build) with > GALLIVM_DEBUG=tgsi,ir,asm to see what the assembly really might do. I think > though this is trying to read a vertex buffer (for position probably) which > just isn't there. I'll provide a new attachment with the output of that. Created attachment 140027 [details]
Crash with GALLIVM_DEBUG=tgsi,ir,asm
To provide the attachment for my previous reply.
(In reply to Marek Olšák from comment #5) > Created attachment 140026 [details] [review] [review] > patch > > Would you please test the attached patch? Thanks. Yay! It seems to have fixed it. It also fixed a crash I got in Cube 2 Sauerbraten (probably because of the same reason). (In reply to Michael Panzlaff from comment #6) > (In reply to Roland Scheidegger from comment #4) > > You have the same tcl-less chipset? > > I'm sorry to say, but I don't really know what a tcl-less chipset is. I'm > using a Dell Latitude D531. Perhaps an lspci output helps? Sorry it's not actually really "tcl-less" by this generation. That was the term used by ATI for "transform, clippling, lighting". But of course for r300 generation it's really just vertex shader hw. Which neither the rs480 nor rs690 have (but all discrete r300 based chips do), it's all done in software. So, it's not quite the same chipset than what the OP got but they behave pretty much the same (I think rs690 would be somewhat faster though than rs480, but not sure anymore...). Glad Marek has a patch to get it working again! Thanks for testing. I pushed the patch as 17a42062ccdb3bf5624435db9598e4353756771f. Closing. Thanks. That patch resolved the issue with clear_with_quad and Cinnamon as well. While I was looking at the source, I did notice that many of the functions assumed that their inputs were valid. I was speculating that clear_with_quad may have calculated a zero or even negative number of elements and passed that along to draw_pt_arrays, which would have passed a NULL array or negative index to llvm. You might want to apply that patch to the 17.1, 17.2 and 17.3 branches as well. While researching this bug I saw several posts from people having llvm problems on Radeon hardware with Fedora and Ubuntu versions based on the 17.x branches. 17.x branches are no longer supported (and 18.0 is uncertain). Users and distros are encouraged to switch to Mesa 18.1 if they want to obtain future fixes. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.