Bug 109695 - qemu using spice gl and sandbox resourcecontrol=deny crashes with SIGSYS on radeonsi
Summary: qemu using spice gl and sandbox resourcecontrol=deny crashes with SIGSYS on r...
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: 18.3
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-20 20:12 UTC by Ahzo
Modified: 2019-04-13 13:50 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Ahzo 2019-02-20 20:12:32 UTC
Since upgrading Mesa from 18.2 to 18.3, launching a QEMU virtual machine with Spice OpenGL enabled (for virgl), causes QEMU to crash with SIGSYS inside the radeonsi driver. The reason for this is that the QEMU sandbox option 'resourcecontrol=deny' disables the sched_setaffinity syscall called in pthread_setaffinity_np, which is now used by the radeonsi driver.

A simple way to reproduce this problem is:
$ gdb --batch --ex run --ex bt --args qemu-system-x86_64 -spice gl=on -sandbox on,resourcecontrol=deny
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff45aa700 (LWP 23432)]
[New Thread 0x7ffff08e5700 (LWP 23433)]
[New Thread 0x7fffe3fff700 (LWP 23434)]
[New Thread 0x7fffe37fe700 (LWP 23435)]

Thread 4 "qemu-system-x86" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7fffe3fff700 (LWP 23434)]
0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
34	../sysdeps/unix/sysv/linux/pthread_setaffinity.c: No such file or directory.
#0  0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
#1  0x00007ffff12ba2b3 in util_queue_thread_func (input=input@entry=0x55555640b1f0) at ../src/util/u_queue.c:252
#2  0x00007ffff12b9c17 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
#3  0x00007ffff68c1fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4  0x00007ffff67f280f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


The problematic code at src/util/u_queue.c:252 was added in the following commit:
commit d877451b48a59ab0f9a4210fc736f51da5851c9a
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Mon Oct 1 15:51:06 2018 -0400

    util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY
    
    Initial version discussed with Rob Clark under a different patch name.
    This approach leaves his driver unaffected.


Since setting the thread affinity seems non-essential here, the failing syscall should be handled gracefully, for example by setting a signal handler to ignore the SIGSYS signal.
Comment 1 Marek Olšák 2019-02-20 21:30:42 UTC
Mesa needs a way to query that it can't set thread affinity.
Comment 2 Ahzo 2019-02-21 17:15:56 UTC
To check for the availability of the syscall, one can try it in a child process and see if the child is terminated by a signal, e.g. like this:

#include <stdbool.h>
#include <unistd.h>
#include <sys/resource.h>
#include <sys/syscall.h>
#include <sys/wait.h>

static bool
can_set_affinity()
{
   pid_t pid = fork();
   int status = 0;
   if (!pid) {
      /* Disable coredumps, because a SIGSYS crash is expected. */
      struct rlimit limit = { 0 };
      limit.rlim_cur = 1;
      limit.rlim_max = 1;
      setrlimit(RLIMIT_CORE, &limit);
      /* Test the syscall in the child process. */
      syscall(SYS_sched_setaffinity, 0, 0, 0);
      _exit(0);
   } else if (pid < 0) {
      return false;
   }
   if (waitpid(pid, &status, 0) < 0) {
      return false;
   }
   if (WIFSIGNALED(status)) {
      /* The child process was terminated by a signal,
       * thus the syscall cannot be used.
       */
      return false;
   }
   return true;
}
Comment 3 Daniel P. Berrange 2019-02-27 14:42:21 UTC
(In reply to Ahzo from comment #2)
> To check for the availability of the syscall, one can try it in a child
> process and see if the child is terminated by a signal, e.g. like this:

Afraid not, QEMU's seccomp filter blocks use of fork() too :-)
Comment 4 Daniel P. Berrange 2019-02-27 14:45:24 UTC
(In reply to Ahzo from comment #0)
> The problematic code at src/util/u_queue.c:252 was added in the following
> commit:
> commit d877451b48a59ab0f9a4210fc736f51da5851c9a
> Author: Marek Olšák <marek.olsak@amd.com>
> Date:   Mon Oct 1 15:51:06 2018 -0400
> 
>     util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY
>     
>     Initial version discussed with Rob Clark under a different patch name.
>     This approach leaves his driver unaffected.
> 
> 
> Since setting the thread affinity seems non-essential here, the failing
> syscall should be handled gracefully, for example by setting a signal
> handler to ignore the SIGSYS signal.

I'm curious what motivated this change to start with ?  Even if QEMU was not enforcing seccomp filters, I think I'd consider it a bug for mesa to be setting its process affinity in this way.  The mgmt application or sysadmin has decided that the process must have a certain affinity, based on how it/they want the host CPUs utilized. Why is mesa wanting to override this administrative policy decision to restrict CPU usage ?
Comment 5 Alex Deucher 2019-02-27 14:54:09 UTC
(In reply to Daniel P. Berrange from comment #4)
> 
> I'm curious what motivated this change to start with ?  Even if QEMU was not
> enforcing seccomp filters, I think I'd consider it a bug for mesa to be
> setting its process affinity in this way.  The mgmt application or sysadmin
> has decided that the process must have a certain affinity, based on how
> it/they want the host CPUs utilized. Why is mesa wanting to override this
> administrative policy decision to restrict CPU usage ?

To improve performance on modern multi-core NUMA architectures.
Comment 6 Marc-Andre Lureau 2019-02-27 23:14:50 UTC
Sent a quick RFC for an env variable workaround on the ML "[PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering".
Comment 7 Marek Olšák 2019-02-28 00:15:48 UTC
(In reply to Daniel P. Berrange from comment #4)
> I'm curious what motivated this change to start with ?  Even if QEMU was not
> enforcing seccomp filters, I think I'd consider it a bug for mesa to be
> setting its process affinity in this way.  The mgmt application or sysadmin
> has decided that the process must have a certain affinity, based on how
> it/they want the host CPUs utilized. Why is mesa wanting to override this
> administrative policy decision to restrict CPU usage ?

The correct solution is to fix pthread_setaffinity such that it returns an error code instead of crashing.

An even better solution would be to have a virtual thread affinity that only the application can see and change, which should be silently masked by administrative policies not visible to the application.
Comment 8 Michel Dänzer 2019-02-28 10:21:44 UTC
(In reply to Marek Olšák from comment #7)
> An even better solution would be to have a virtual thread affinity that only
> the application can see and change, which should be silently masked by
> administrative policies not visible to the application.

Mesa doesn't really need explicit thread affinity at all. All it wants is that certain sets of threads run on the same CPU module; it doesn't care which particular CPU module that is. What's really needed is an API to express this affinity between threads, instead of to specific CPU cores.
Comment 9 Ahzo 2019-03-02 11:36:02 UTC
(In reply to Daniel P. Berrange from comment #3)
> (In reply to Ahzo from comment #2)
> > To check for the availability of the syscall, one can try it in a child
> > process and see if the child is terminated by a signal, e.g. like this:
> 
> Afraid not, QEMU's seccomp filter blocks use of fork() too :-)

Maybe it should, at least when using the spawn=deny option, but currently it doesn't. That option only blocks the fork, vfork and execve syscalls, but glibc's fork() function uses the clone syscall, and thus continues to work.
However, that behavior might be different when using other C library implementations, so it wouldn't be correct to rely on this.
One could use clone() instead of fork(), but future versions of qemu might block the clone syscall, as well.

Unfortunately, I'm not aware of a proper solution for this bug short of adding a new API to the kernel.
Comment 10 Dylan Baker 2019-03-06 18:17:01 UTC
We're getting down to just a few bugs blocking 19.0, so I'm pinging those bugs to see what the progress is?
Comment 11 Dylan Baker 2019-03-11 17:04:20 UTC
I'm removing this from the 19.0 blocking tracker. Generally we don't add bugs to block a release if they were present in the previous release, additionally there doesn't seem to be any consensus on a solution, at this moment. If there is a fix implemented I'd be happy to pull that into a later 19.0 release.
Comment 12 Marek Olšák 2019-04-02 13:03:15 UTC
(In reply to Michel Dänzer from comment #8)
> Mesa doesn't really need explicit thread affinity at all. All it wants is
> that certain sets of threads run on the same CPU module; it doesn't care
> which particular CPU module that is. What's really needed is an API to
> express this affinity between threads, instead of to specific CPU cores.

I think the thread affinity API is a correct way to optimize for CPU cache topologies. pthread is a basic user API. Security policies shouldn't disallow pthread functions.
Comment 13 Ahzo 2019-04-13 13:50:57 UTC
This problem was solved by qemu [1], so this mesa bug can be closed.

[1] https://git.qemu.org/git/qemu.git/?a=commitdiff;h=9a1565a03b79d80b236bc7cc2dbce52a2ef3a1b8


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.