Bug 59001 - [Wayland] queue-test test failed on ppc64
Summary: [Wayland] queue-test test failed on ppc64
Status: RESOLVED FIXED
Alias: None
Product: Wayland
Classification: Unclassified
Component: wayland (show other bugs)
Version: unspecified
Hardware: PowerPC Linux (All)
: medium normal
Assignee: Wayland bug list
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-03 22:19 UTC by Dinar Valeev
Modified: 2013-02-28 02:50 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
FIx alignment (3.67 KB, patch)
2013-02-08 02:10 UTC, Kristian Høgsberg
Details | Splinter Review
strace output (9.89 KB, text/plain)
2013-02-11 18:06 UTC, Dinar Valeev
Details
new strace (10.16 KB, text/plain)
2013-02-11 19:07 UTC, Dinar Valeev
Details
strace -f in non chroot (49.02 KB, text/plain)
2013-02-11 22:11 UTC, Dinar Valeev
Details

Description Dinar Valeev 2013-01-03 22:19:57 UTC
On ppc64 queue-test fails with:
test "queue":	queue-test.c:86: Assertion `counter == 1' failed.
signal 11, fail.
1 tests, 0 pass, 1 fail
FAIL: queue-test

wayland version is 1.0.3
OS: openSUSE Factory
buildlog available here: https://build.opensuse.org/package/live_build_log?arch=ppc64&package=wayland&project=openSUSE%3AFactory%3APowerPC&repository=standard
Comment 1 Dinar Valeev 2013-01-04 00:52:55 UTC
log with WAYLAND_DEBUG=1
[3315420.494]  -> wl_display@1.get_registry(new id wl_registry@2)
[3315420.589]  -> wl_display@1.sync(new id wl_callback@3)
[3315420.751] wl_display@1.get_registry(new id wl_registry@993740600)
queue-test.c:86: Assertion `counter == 1' failed.
test "queue":	signal 11, fail.
Comment 2 Jonas Ådahl 2013-01-04 08:33:18 UTC
It looks like wl_display_roundtrip() returns before it receives the done event. As of 1.0.3 the only way this can happen is if an error occurs when dispatching the queue. Could you test adding a check to the return value of wl_display_roundtrip() before the assert that fails and check if it is != -1?
Comment 3 Dinar Valeev 2013-01-04 10:20:14 UTC
wl_display_roundtrip() returns -1 in my case
Comment 4 Jonas Ådahl 2013-01-04 10:38:13 UTC
So queue dispatching fails for some reason. You should be able to check errno for why something failed. Could be a broken pipe or something because the server part crashed. I don't have access to any ppc64 hardware so I cannot debug this myself.
Comment 5 Dinar Valeev 2013-01-04 11:00:57 UTC
errno is 32, so it is broken pipe right?
Comment 6 Jonas Ådahl 2013-01-04 11:11:01 UTC
It seems so yes.
Comment 7 Dinar Valeev 2013-01-04 16:57:59 UTC
len = recvmsg(sockfd, msg, flags | MSG_CMSG_CLOEXEC);
returns 0, connection is terminanted?

Any hints to debug it further?
Comment 8 Jonas Ådahl 2013-01-04 17:11:29 UTC
Looking at the output you provided, the problem seems to be in the server process. Note the "signal 11, fail" i.e. "segmentation fault" message. The queue test consists of two processes; the server process, and the forked client process. It looks like the server process is the one that crashes, resulting in no registry objects being transmitted and counter not reaching 2 when it gets the EPIPE error when reading the socket.

Debugging the test cases are not very convenient, but what you can try to do is to add a sleep to the server part, start the test case and attach gdb to it before it continues and then see where it crashes. Be careful not to attach to the client process, or the test runner process.
Comment 9 Kristian Høgsberg 2013-01-04 19:54:28 UTC
If this only happens on ppc64, it could be an alignment problem, ie that we write or read a 32 bit value on an address that's not a multiple of 4 bytes (or 64 bit  value or pointer value on an address not a multiple of 8 bytes.
Comment 10 Dinar Valeev 2013-01-04 20:48:00 UTC
Yes. This is only ppc64 (64bit) issue, ppc passes this test.
Comment 11 Philip Withnall 2013-02-04 11:12:35 UTC
Most likely this: http://lists.freedesktop.org/archives/wayland-devel/2013-February/007275.html.

Copied for posterity:

Around line 740 of connection.c, demarshalling an object:

        id = (uint32_t **) extra;
        extra += sizeof *id;
        closure->args[i] = id;
        *id = p;

On 64-bit MIPS, the assignment to *id gets turned into a
store-double-word instruction (since pointer 'p' is 64 bits wide), which
must be to a 8-byte-aligned address. It's possible for 'extra' to not be
8-byte aligned, and hence for the store to not be aligned.

In the particular case I'm hitting, 'extra' is not 8-byte-aligned
because the message size is 12, but it also looks like alignment could
be changed in other ways; e.g. during handling a 'h'-type argument near
the bottom of the function, where 'extra' is incremented by the size of
an int.
Comment 12 Kristian Høgsberg 2013-02-08 02:10:04 UTC
Created attachment 74394 [details] [review]
FIx alignment

This patch should fix the 64-bit alignment problems.  Care to give it a try?
Comment 13 Philip Withnall 2013-02-08 08:54:50 UTC
(In reply to comment #12)
> Created attachment 74394 [details] [review] [review]
> FIx alignment
> 
> This patch should fix the 64-bit alignment problems.  Care to give it a try?

Works great for me (on Linux and MIPS)!
Comment 14 Dinar Valeev 2013-02-08 10:50:24 UTC
The patch doesn't fix queue-test failure on ppc64
Comment 15 Kristian Høgsberg 2013-02-08 16:35:18 UTC
(In reply to comment #14)
> The patch doesn't fix queue-test failure on ppc64

Could you try running queue-test under gdb and get a stack trace?  If you say

 $ libtool --mode=execute gdb ./queue-test

and then type run, gdb should follow the parent process (the server) which is where the segfault happens.   When you get the segfault type bt to get a backtrace and attach that here.  Thanks.
Comment 16 Dinar Valeev 2013-02-08 16:58:48 UTC
Got no stack here:
(gdb) run
Starting program: /home/abuild/rpmbuild/BUILD/wayland-1.0.3/tests/.libs/queue-test 
warning: Could not load shared library symbols for linux-vdso64.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Detaching after fork from child process 28649.
queue-test: test-helpers.c:39: count_open_fds: Assertion `dir && "opening /proc/self/fd failed."' failed.
test "queue":	signal 6, fail.
1 tests, 0 pass, 1 fail
[Inferior 1 (process 28648) exited with code 01]
(gdb) bt
No stack.
Comment 17 Kristian Høgsberg 2013-02-08 17:10:00 UTC
(In reply to comment #16)
> Got no stack here:
> (gdb) run
> Starting program:
> /home/abuild/rpmbuild/BUILD/wayland-1.0.3/tests/.libs/queue-test 
> warning: Could not load shared library symbols for linux-vdso64.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Detaching after fork from child process 28649.
> queue-test: test-helpers.c:39: count_open_fds: Assertion `dir && "opening
> /proc/self/fd failed."' failed.
> test "queue":	signal 6, fail.
> 1 tests, 0 pass, 1 fail
> [Inferior 1 (process 28648) exited with code 01]
> (gdb) bt
> No stack.

That's a different bug than the segfault (signal 11) above.  The test has a built-in check for leaking fds which needs to read /proc and that's failing for some reason... running in a chroot?
Comment 18 Dinar Valeev 2013-02-08 17:16:41 UTC
Yes, I'm running in chroot, but with /proc mounted

abuild@wolfberry-1:~/rpmbuild/BUILD/wayland-1.0.3/tests> libtool --mode=execute gdb ./queue-test
GNU gdb (GDB) SUSE (7.5.1-1.1)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/abuild/rpmbuild/BUILD/wayland-1.0.3/tests/.libs/queue-test...done.
(gdb) run
Starting program: /home/abuild/rpmbuild/BUILD/wayland-1.0.3/tests/.libs/queue-test 
Missing separate debuginfo for /lib64/ld64.so.1
Try: zypper install -C "debuginfo(build-id)=c5b01adb2370f144d08c65f2e6f2000a715fe708"
warning: Could not load shared library symbols for linux-vdso64.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Missing separate debuginfo for /lib64/libdl.so.2
Try: zypper install -C "debuginfo(build-id)=318d19287fdb90b171b307d748fe5a366548202d"
Missing separate debuginfo for /lib64/libc.so.6
Try: zypper install -C "debuginfo(build-id)=8e29c7c7c3bf9106db1d18677425233bddd086b9"
Missing separate debuginfo for /usr/lib64/libffi.so.4
Try: zypper install -C "debuginfo(build-id)=db9a86960817b058b8d718b25b14798c76c1951a"
Missing separate debuginfo for /lib64/librt.so.1
Try: zypper install -C "debuginfo(build-id)=6596a9d63e16d493af356fa6498322558dfb0b88"
Missing separate debuginfo for /lib64/libpthread.so.0
Try: zypper install -C "debuginfo(build-id)=771909dc5849650e92bb91ec07a494046da52c0c"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Detaching after fork from child process 29463.
queue-test: queue-test.c:273: queue: Assertion `ret == 0' failed.
test "queue":	signal 6, fail.
1 tests, 0 pass, 1 fail
[Inferior 1 (process 29459) exited with code 01]
(gdb) bt
No stack.
Comment 19 Kristian Høgsberg 2013-02-11 17:04:41 UTC
(In reply to comment #18)
> Yes, I'm running in chroot, but with /proc mounted
> 
> abuild@wolfberry-1:~/rpmbuild/BUILD/wayland-1.0.3/tests> libtool
> --mode=execute gdb ./queue-test
> GNU gdb (GDB) SUSE (7.5.1-1.1)
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "ppc64-suse-linux".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from
> /home/abuild/rpmbuild/BUILD/wayland-1.0.3/tests/.libs/queue-test...done.
> (gdb) run
> Starting program:
> /home/abuild/rpmbuild/BUILD/wayland-1.0.3/tests/.libs/queue-test 
> Missing separate debuginfo for /lib64/ld64.so.1
> Try: zypper install -C
> "debuginfo(build-id)=c5b01adb2370f144d08c65f2e6f2000a715fe708"
> warning: Could not load shared library symbols for linux-vdso64.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> Missing separate debuginfo for /lib64/libdl.so.2
> Try: zypper install -C
> "debuginfo(build-id)=318d19287fdb90b171b307d748fe5a366548202d"
> Missing separate debuginfo for /lib64/libc.so.6
> Try: zypper install -C
> "debuginfo(build-id)=8e29c7c7c3bf9106db1d18677425233bddd086b9"
> Missing separate debuginfo for /usr/lib64/libffi.so.4
> Try: zypper install -C
> "debuginfo(build-id)=db9a86960817b058b8d718b25b14798c76c1951a"
> Missing separate debuginfo for /lib64/librt.so.1
> Try: zypper install -C
> "debuginfo(build-id)=6596a9d63e16d493af356fa6498322558dfb0b88"
> Missing separate debuginfo for /lib64/libpthread.so.0
> Try: zypper install -C
> "debuginfo(build-id)=771909dc5849650e92bb91ec07a494046da52c0c"
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Detaching after fork from child process 29463.
> queue-test: queue-test.c:273: queue: Assertion `ret == 0' failed.
> test "queue":	signal 6, fail.
> 1 tests, 0 pass, 1 fail
> [Inferior 1 (process 29459) exited with code 01]
> (gdb) bt
> No stack.

This is a different failure that in comment 16.  This time the test is failing to create the server socket, which I suspect is another problem with running the test in chroot.  Can you try running the test under strace?

 $ libtool --mode=execute strace -olog.txt ./queue-test

and attach the log.txt?
Comment 20 Dinar Valeev 2013-02-11 18:06:38 UTC
Created attachment 74630 [details]
strace output

Here is requested info
Comment 21 Kristian Høgsberg 2013-02-11 18:57:13 UTC
(In reply to comment #20)
> Created attachment 74630 [details]
> strace output
> 
> Here is requested info

Oh, oops, that's just output for the test runner, which forks to run the actual test case.  Try this instead:

 $ libtool --mode=execute strace -olog.txt ./queue-test queue
Comment 22 Dinar Valeev 2013-02-11 19:07:59 UTC
Created attachment 74645 [details]
new strace
Comment 23 Kristian Høgsberg 2013-02-11 19:37:17 UTC
(In reply to comment #22)
> Created attachment 74645 [details]
> new strace

Does the test run outside the chroot for you?  I don't see a failure in the strace output, but the queue test itselfs forks, so it could be the child there failing.  Can you try adding -f to the strace arguments?
Comment 24 Dinar Valeev 2013-02-11 22:11:40 UTC
Created attachment 74656 [details]
strace -f in non chroot

Ok here is starce output from libtool --mode=execute strace -f ./queue-test

gdb in non chroot still gives me no stack.

Let me know if you need more information.
Comment 25 Kristian Høgsberg 2013-02-26 18:51:02 UTC
I've just committed Jason clean up of the connection code and it should remove the source of these alignment problems.  Can you try git master again?  We'll backport to a 1.0 release if it works out alright.

commit 2fc248dc2c877d02694db40aad52180d71373d5a
Author: Jason Ekstrand <jason@jlekstrand.net>
Date:   Tue Feb 26 11:30:51 2013 -0500

    Clean up and refactor wl_closure and associated functions
Comment 26 Dinar Valeev 2013-02-28 02:12:11 UTC
Yes. Master works for me.
Comment 27 Kristian Høgsberg 2013-02-28 02:50:22 UTC
(In reply to comment #26)
> Yes. Master works for me.

Great, thanks for testing the fix.  I'll pull the fix back into the 1.0.6 release.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.