Bug 92027 - 'test/stress-test' segfaults on i686 (pixman 0.32.6)
Summary: 'test/stress-test' segfaults on i686 (pixman 0.32.6)
Status: RESOLVED FIXED
Alias: None
Product: pixman
Classification: Unclassified
Component: pixman (show other bugs)
Version: other
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Søren Sandmann Pedersen
QA Contact: Søren Sandmann Pedersen
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-16 19:32 UTC by Ludovic Courtès
Modified: 2015-09-22 10:25 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Fix use of uninitialized value (482 bytes, patch)
2015-09-16 19:59 UTC, Ludovic Courtès
Details | Splinter Review
Avoid numerical overflow in pointer arithmetic. (771 bytes, patch)
2015-09-21 13:33 UTC, Ludovic Courtès
Details | Splinter Review

Description Ludovic Courtès 2015-09-16 19:32:48 UTC
We found out that 'test/stress-test' segfaults on i686-linux-gnu (this is with glibc 2.22, GCC 4.9.3, libpng 1.5.21):

$ OMP_NUM_THREADS=1 MALLOC_CHECK_=2  ../libtool --mode=execute ~ludo/.guix-profile/bin/gdb --args ./stress-test
GNU gdb (GDB) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /tmp/nix-build-pixman-0.32.6.drv-0/pixman-0.32.6/test/.libs/stress-test...done.
(gdb) r
Starting program: /tmp/nix-build-pixman-0.32.6.drv-0/pixman-0.32.6/test/.libs/stress-test 
/gnu/store/8l7y994zvfslrjlma01scfkxhqs5cc7j-bash-4.3.39/bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
warning: File "/gnu/store/2q07masxvx2ja4ir76g4chsg833kyd02-glibc-2.22/lib/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
	add-auto-load-safe-path /gnu/store/2q07masxvx2ja4ir76g4chsg833kyd02-glibc-2.22/lib/libthread_db-1.0.so
line to your configuration file "/homeless-shelter/.gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/homeless-shelter/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

Program received signal SIGSEGV, Segmentation fault.
fast_fetch_r5g6b5 (iter=0xffff5e68, mask=0x0) at pixman-fast-path.c:2191
2191		t0 = ((sr << 16) & 0x00FF0000) | ((sg << 8) & 0x0000FF00) |
(gdb) bt
#0  fast_fetch_r5g6b5 (iter=0xffff5e68, mask=0x0) at pixman-fast-path.c:2191
#1  0xf7f761b5 in general_composite_rect (imp=0x8057018, info=0xffffbf44) at pixman-general.c:212
#2  0xf7f2c6f0 in pixman_image_composite32 (op=PIXMAN_OP_CONJOINT_CLEAR, src=0x80592b8, mask=0x8059778, dest=0x80590f0, src_x=20, 
    src_y=918, mask_x=42, mask_y=1, dest_x=0, dest_y=0, width=11764, height=17) at pixman.c:707
#3  0x0804ba81 in run_test (seed=seed@entry=94, verbose=<optimized out>, mod=<optimized out>) at stress-test.c:928
#4  0x0804bdc8 in main._omp_fn.0 () at stress-test.c:1031
#5  0xf7e801cc in GOMP_parallel () from /gnu/store/r0p8d0a3n2zlkj1l5a9xv368yhbgc5bh-gcc-4.9.3-lib/lib/libgomp.so.1
#6  0x08049b65 in main (argc=1, argv=0xffffc194) at stress-test.c:1028
(gdb) info threads
  Id   Target Id         Frame 
* 1    process 16589 "stress-test" fast_fetch_r5g6b5 (iter=0xffff5e68, mask=0x0) at pixman-fast-path.c:2191
(gdb) p iter->buffer
$1 = (uint32_t *) 0xce40

Valgrind provides additional details:

$ OMP_NUM_THREADS=1 MALLOC_CHECK_=2  ../libtool --mode=execute /gnu/store/k9bbfhfa7cwkr76id5day8ip5gw8qklc-valgrind-3.10.1/bin/valgrind --track-origins=yes ./stress-test
==24842== Memcheck, a memory error detector
==24842== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==24842== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==24842== Command: /tmp/nix-build-pixman-0.32.6.drv-0/pixman-0.32.6/test/.libs/stress-test
==24842== 
==24842== Use of uninitialised value of size 4
==24842==    at 0x40452D7: convert_and_store_pixel (pixman-access.c:389)
==24842==    by 0x40452D7: store_scanline_g1 (pixman-access.c:507)
==24842==    by 0x4046E9E: dest_write_back_narrow (pixman-bits-image.c:836)
==24842==    by 0x40851DF: general_composite_rect (pixman-general.c:216)
==24842==    by 0x403B6EF: pixman_image_composite32 (pixman.c:707)
==24842==    by 0x804BA80: run_test (stress-test.c:928)
==24842==    by 0x804BDC7: main._omp_fn.0 (stress-test.c:1031)
==24842==    by 0x41871CB: GOMP_parallel (in /gnu/store/r0p8d0a3n2zlkj1l5a9xv368yhbgc5bh-gcc-4.9.3-lib/lib/libgomp.so.1.0.0)
==24842==    by 0x8049B64: main (stress-test.c:1028)
==24842==  Uninitialised value was created by a stack allocation
==24842==    at 0x4084F3F: general_composite_rect (pixman-general.c:115)
==24842== 
==24842== Conditional jump or move depends on uninitialised value(s)
==24842==    at 0x408746D: linear_get_scanline_narrow (pixman-linear-gradient.c:176)
==24842==    by 0x40851A6: general_composite_rect (pixman-general.c:211)
==24842==    by 0x403B6EF: pixman_image_composite32 (pixman.c:707)
==24842==    by 0x804BA80: run_test (stress-test.c:928)
==24842==    by 0x804BDC7: main._omp_fn.0 (stress-test.c:1031)
==24842==    by 0x41871CB: GOMP_parallel (in /gnu/store/r0p8d0a3n2zlkj1l5a9xv368yhbgc5bh-gcc-4.9.3-lib/lib/libgomp.so.1.0.0)
==24842==    by 0x8049B64: main (stress-test.c:1028)
==24842==  Uninitialised value was created by a stack allocation
==24842==    at 0x4084F3F: general_composite_rect (pixman-general.c:115)
==24842== 
==24842== Conditional jump or move depends on uninitialised value(s)
==24842==    at 0x404ECFD: combine_mask (pixman-combine32.c:130)
==24842==    by 0x404ECFD: combine_hsl_luminosity_u (pixman-combine32.c:1260)
==24842==    by 0x40851D4: general_composite_rect (pixman-general.c:214)
==24842==    by 0x403B6EF: pixman_image_composite32 (pixman.c:707)
==24842==    by 0x804BA80: run_test (stress-test.c:928)
==24842==    by 0x804BDC7: main._omp_fn.0 (stress-test.c:1031)
==24842==    by 0x41871CB: GOMP_parallel (in /gnu/store/r0p8d0a3n2zlkj1l5a9xv368yhbgc5bh-gcc-4.9.3-lib/lib/libgomp.so.1.0.0)
==24842==    by 0x8049B64: main (stress-test.c:1028)
==24842==  Uninitialised value was created by a stack allocation
==24842==    at 0x4084F3F: general_composite_rect (pixman-general.c:115)
==24842== 
==24842== Conditional jump or move depends on uninitialised value(s)
==24842==    at 0x40A68C3: core_combine_in_u_pixel_sse2 (pixman-sse2.c:819)
==24842==    by 0x40A68C3: sse2_combine_in_reverse_u (pixman-sse2.c:947)
==24842==    by 0x40851D4: general_composite_rect (pixman-general.c:214)
==24842==    by 0x403B6EF: pixman_image_composite32 (pixman.c:707)
==24842==    by 0x804BA80: run_test (stress-test.c:928)
==24842==    by 0x804BDC7: main._omp_fn.0 (stress-test.c:1031)
==24842==    by 0x41871CB: GOMP_parallel (in /gnu/store/r0p8d0a3n2zlkj1l5a9xv368yhbgc5bh-gcc-4.9.3-lib/lib/libgomp.so.1.0.0)
==24842==    by 0x8049B64: main (stress-test.c:1028)
==24842==  Uninitialised value was created by a stack allocation
==24842==    at 0x4084F3F: general_composite_rect (pixman-general.c:115)

Any ideas?
Comment 1 Ludovic Courtès 2015-09-16 19:33:36 UTC
I can provide instructions on how to reproduce it using GNU Guix.
Comment 2 Ludovic Courtès 2015-09-16 19:59:22 UTC
Created attachment 118319 [details] [review]
Fix use of uninitialized value

The attached patch placates Valgrind but I'm still getting segfaults in the build container that I cannot reproduce outside of the container.
Comment 3 andreas 2015-09-17 19:41:32 UTC
This also concerns the development snapshot 0.33.2.
Comment 4 Siarhei Siamashka 2015-09-19 20:04:58 UTC
I can't reproduce the problem on my 64-bit system (using CFLAGS="-O2 -m32" to build 32-bit code).(In reply to Ludovic Courtès from comment #2)
> Created attachment 118319 [details] [review] [review]
> Fix use of uninitialized value
> 
> The attached patch placates Valgrind but I'm still getting segfaults in the
> build container that I cannot reproduce outside of the container.

The uninitialized value alarm is a known quirk, explained here with more details: http://lists.freedesktop.org/archives/pixman/2013-December/003164.html
Using memset is not desired in the performance critical part of code, so we would want a better solution. Relying on TLS (as mentioned in the mailing list) is not a very good option either because we also have to support weird operating systems, such as Microsoft Windows. See bug #34842
In any case, it is very unlikely to be related to your segfault.
Comment 5 Siarhei Siamashka 2015-09-19 22:11:40 UTC
(In reply to Ludovic Courtès from comment #1)
> I can provide instructions on how to reproduce it using GNU Guix.

I tried to install GCC 4.9.3, but still can't reproduce the problem. Additional information would be appreciated.

Also please try "--disable-openmp" configure option (so that we don't need "OMP_NUM_THREADS=1") and also "--disable-gtk --disable-libpng --disable-shared --enable-static-testprogs" as an additional experiment. Removing OpenMP and libtool out of the way may make debugging easier.
Comment 6 Ludovic Courtès 2015-09-21 13:30:45 UTC
(The trick to deterministically reproduce the issue was to turn off ASLR either in GDB or with 'setarch -R', which is what Guix's build daemon does.)

After some more debugging, I found what seems to be the root cause.  'general_composite_rect' allocates a fixed-size buffer on the stack and then computes 'src_buffer' and 'dest_buffer' as a function of that buffer's address and 'width'.

The problem is that the buffer address is typically high (since it's on the stack), so for high values of 'width', adding 'width * Bpp' to that address can overflow.  Thus, the test to determine whether to fall back to heap allocation:

    if (ALIGN (dest_buffer + width * Bpp) >
	    scanline_buffer + sizeof (stack_scanline_buffer))

will just fail, because 'dest_buffer + width * Bpp' is numerically smaller than 'scanline_buffer'.  Here's an example GDB session:

(gdb) p dest_buffer
$1 = (uint8_t *) 0xd0f0 <error: Cannot access memory at address 0xd0f0>
(gdb) p src_buffer
$2 = (uint8_t *) 0xffff6150 ""
(gdb) p mask_buffer
$3 = (uint8_t *) 0x1920 <error: Cannot access memory at address 0x1920>
(gdb) p src_buffer
$4 = (uint8_t *) 0xffff6150 ""
(gdb) p width
$5 = 11764
(gdb) p src_buffer + width
$6 = (uint8_t *) 0xffff8f44 ""
(gdb) p Bpp
$7 = <optimized out>
(gdb) p src_buffer + width*16
$8 = (uint8_t *) 0x24090 <error: Cannot access memory at address 0x24090>
(gdb) p mask_buffer
$9 = (uint8_t *) 0x1920 <error: Cannot access memory at address 0x1920>
(gdb) p src_buffer + width*8
$10 = (uint8_t *) 0xd0f0 <error: Cannot access memory at address 0xd0f0>
(gdb) p src_buffer + width*4
$11 = (uint8_t *) 0x1920 <error: Cannot access memory at address 0x1920>
(gdb) p scanline_buffer
$12 = (uint8_t *) 0xffff6150 ""
(gdb) p src_buffer
$13 = (uint8_t *) 0xffff6150 ""
(gdb) p width
$14 = 11764

I'm attaching a patch which I think solves the issue correctly.  Please let me know what you think.
Comment 7 Ludovic Courtès 2015-09-21 13:33:45 UTC
Created attachment 118382 [details] [review]
Avoid numerical overflow in pointer arithmetic.
Comment 8 Siarhei Siamashka 2015-09-21 14:05:22 UTC
Thanks for debugging this and also for the detailed analysis of the problem! This looks like a rather important bugfix. Please send your patch to the pixman mailing list for review: http://lists.freedesktop.org/mailman/listinfo/pixman

Though based on a quick look, the first revision of the patch does not seem to be entirely correct because the ALIGN macro is not fully accounted for. This macro ensures 16 bytes alignment, which is needed for SIMD optimized code. And using just (width + 1) is not sufficient for this.

Also this whole part of code may need to rewritten a bit, because calculating addresses beyond the array bounds even without dereferencing them is a bad idea: http://stackoverflow.com/questions/988158/take-the-address-of-a-one-past-the-end-array-element-via-subscript-legal-by-the
Comment 9 Oded Gabbay 2015-09-22 10:25:54 UTC
Patch that fixes this problem was pushed to master and to 0.32 branch.
A new stable release, 0.32.8, will be made very soon to include this patch


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.