Bug 99066 - Pulseaudio segfaults when ORC is used on x32
Summary: Pulseaudio segfaults when ORC is used on x32
Status: NEW
Alias: None
Product: PulseAudio
Classification: Unclassified
Component: core (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: pulseaudio-bugs
QA Contact: pulseaudio-bugs
URL:
Whiteboard:
Keywords:
Depends on: 103656
Blocks:
  Show dependency treegraph
 
Reported: 2016-12-12 16:32 UTC by EoD
Modified: 2017-11-09 23:14 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
pavucontrol gdb backtrace (8.72 KB, text/x-log)
2017-01-02 12:08 UTC, EoD
Details
pulseaudio gdb backtrace (13.66 KB, text/plain)
2017-01-03 19:18 UTC, EoD
Details
svolume-orc-gen.c (18.84 KB, text/x-csrc)
2017-01-03 19:48 UTC, EoD
Details

Note You need to log in before you can comment on or make changes to this bug.
Description EoD 2016-12-12 16:32:13 UTC
I am using a x32 environment on a Gentoo machine with:
- kernel 4.8.14
- Pulseaudio 9.0
- orc 0.4.26

When I enable ORC, start pulseaudio (either 8.0 or 9.0) and use pavucontrol to change the volume of my HDMI sink, pavucontrol and alsa-sink-HDMI segfault.

> traps: alsa-sink-HDMI [16977] general protection ip:f77420c0 sp:f3016868 error:0 in orcexec.kq0Hm1

If I change the volume of a analog sink the issues is not triggered.
If I change the volume via pacmd or change the volume via mplayer, the issue is not triggered.

When I use "PULSE_NO_SIMD=1 pulseaudio", the issue does not occur as well.
Comment 1 Tanu Kaskinen 2016-12-12 17:00:40 UTC
It's weird that this doesn't happen with the analog sink. The difference could be due to having hardware volume for the analog sink but not for the hdmi sink. Can you reproduce this on either sink if you set "flat-volumes = no" in ~/.config/pulse/daemon.conf and change a stream volume rather than a sink volume?
Comment 2 EoD 2016-12-12 19:15:53 UTC
(In reply to Tanu Kaskinen from comment #1)
> It's weird that this doesn't happen with the analog sink. The difference
> could be due to having hardware volume for the analog sink but not for the
> hdmi sink. Can you reproduce this on either sink if you set "flat-volumes =
> no" in ~/.config/pulse/daemon.conf and change a stream volume rather than a
> sink volume?

The option does not make a difference, but I found a few more patterns:

- When I just *start* pavucontrol while mplayer is playing via pulse, I get an segfault. The same happens when pavucontrol is running and mplayer is started.

- When I try to switch an analog stream to a hdmi stream in pavucontrol, I get a segfault as well.
Comment 3 Tanu Kaskinen 2016-12-14 14:21:55 UTC
Do you know how to get a backtrace with gdb?
Comment 4 EoD 2017-01-02 12:08:21 UTC
Created attachment 128708 [details]
pavucontrol gdb backtrace

(In reply to Tanu Kaskinen from comment #3)
> Do you know how to get a backtrace with gdb?

What backtrace did you have in mind? I ran a backtrace on pavucontrol while changing the volume slider for the HDMI sink (see attachment).
Comment 5 EoD 2017-01-03 19:18:49 UTC
Created attachment 128732 [details]
pulseaudio gdb backtrace

Here is a gdb backtrace for the pulseaudio instance
Comment 6 EoD 2017-01-03 19:48:56 UTC
Created attachment 128733 [details]
svolume-orc-gen.c
Comment 7 Tanu Kaskinen 2017-01-04 18:06:09 UTC
Debugging this isn't going to be fun. The backtrace isn't very helpful, because the crash happens in ORC-generated code that can be only debugged by reading the assembly code. I'm tempted to disable ORC in pulseaudio until ORC gets fixed (or until someone shows that this is actually a pulseaudio bug)... but let's not give up quite yet.

I could try to reproduce this, but I don't have HDMI and I don't have an x32 environment. Can you reproduce this with module-null-sink instead of the HDMI output? As for the x32 environment, do you think it would be possible for me to run an x32 environment in a chroot in an otherwise 64-bit system?
Comment 8 EoD 2017-01-04 20:12:03 UTC
(In reply to Tanu Kaskinen from comment #7)
> Debugging this isn't going to be fun. The backtrace isn't very helpful,
> because the crash happens in ORC-generated code that can be only debugged by
> reading the assembly code. I'm tempted to disable ORC in pulseaudio until
> ORC gets fixed (or until someone shows that this is actually a pulseaudio
> bug)... but let's not give up quite yet.
> 
> I could try to reproduce this, but I don't have HDMI and I don't have an x32
> environment. Can you reproduce this with module-null-sink instead of the
> HDMI output? As for the x32 environment, do you think it would be possible
> for me to run an x32 environment in a chroot in an otherwise 64-bit system?

I tried a chroot inside my amd64 system and I can reproduce the crash there. But I do have a multilib (glibc/gcc) setup with amd64, x86 and x32.
Comment 9 Tanu Kaskinen 2017-01-06 01:38:25 UTC
Did you try with module-null-sink? If the crash only happens with HDMI, I will likely not be able to reproduce it.
Comment 10 EoD 2017-01-06 02:40:55 UTC
(In reply to Tanu Kaskinen from comment #9)
> Did you try with module-null-sink? If the crash only happens with HDMI, I
> will likely not be able to reproduce it.

Yes, it also happens with a null-sink. The backtrace looks basically identical on the steps #0, #1 and #2.
Comment 11 Tanu Kaskinen 2017-01-10 20:54:45 UTC
A small status update: I managed to reproduce the crash in an x32 chroot. Then I built pulseaudio without optimizations, because I didn't like gdb telling me "optimized out" when I asked it something, and now I get a segfault during startup in pulsecore/cpu-x86.c:33 get_cpuid(). I'll try to figure out the cpuid segfault first, but if that'll turn out to be too difficult, I'll continue by taking the crashing volume code and putting it in a small self-contained test case.
Comment 12 Tanu Kaskinen 2017-01-18 08:25:58 UTC
I found out the reason for the get_cpuid() crash. The function looks like this (slightly edited for readability):

static void get_cpuid(uint32_t op,
                      uint32_t *a, uint32_t *b, uint32_t *c, uint32_t *d) {
    __asm__ __volatile__ (
        "  push %%rbx          \n\t"
        "  cpuid               \n\t"
        "  mov %%ebx, %%esi    \n\t"
        "  pop %%rbx           \n\t"

        : "=a" (*a), "=S" (*b), "=c" (*c), "=d" (*d)
        : "0" (op)
    );
}

When building without optimizations, the function parameters are saved in the stack. The stack pointer register isn't updated, however, so our push instruction ends up overwriting the op and a parameters! Before that happens, the op parameter is saved in the eax register, and the stack memory for op won't be needed after that, so that's fine, but the a parameter is replaced with whatever happened to be in the rbx register. When trying to save the cpuid result in the memory pointed to by a, a segfault happens, because a is an invalid pointer at that point.

If the stack pointer was updated when the compiler wrote the parameters in the stack, this would have been avoided. So is this a compiler bug? I'm not sure. I found out that the AMD64 ABI allows the compiler to not update the stack pointer in functions that don't call other functions. However, if the stack pointer is not updated, inline assembly that relies on the stack pointer will not work as intended (as seen here), so maybe the compiler should always update the stack pointer if it sees that the function contains inline assembly code that has push/pop instructions?

This code happens to work on a "normal" 64-bit system by chance, because the memory layout for the pointers is a bit different, and the push operation will only overwrite the op parameter, which does no harm. When building with optimizations, I guess the function parameters aren't saved in the stack, so that will also avoid the crash.

I guess I'll have to prepare a bug report for GCC next (I'm not sure if it's a compiler bug, but hopefully they'll tell me).
Comment 13 Arun Raghavan 2017-01-25 07:18:04 UTC
Also CC'ing Wim in case he's seen this before.
Comment 14 Tanu Kaskinen 2017-02-04 18:53:36 UTC
I was asked for more information about the cpuid crash, so here we go:

This is the code that GCC generates for get_cpuid() on x32:

0xf7b6a270 <get_cpuid>          push   %rbp
0xf7b6a271 <get_cpuid+1>        mov    %esp,%ebp
0xf7b6a273 <get_cpuid+3>        mov    %edi,-0x4(%ebp)
0xf7b6a277 <get_cpuid+7>        mov    %rcx,%rax
0xf7b6a27a <get_cpuid+10>       mov    %r8,%rcx
0xf7b6a27d <get_cpuid+13>       mov    %esi,-0x8(%ebp)
0xf7b6a281 <get_cpuid+17>       mov    %edx,-0xc(%ebp)
0xf7b6a285 <get_cpuid+21>       mov    %eax,-0x10(%ebp)
0xf7b6a289 <get_cpuid+25>       mov    %ecx,-0x14(%ebp)
0xf7b6a28d <get_cpuid+29>       mov    -0x4(%ebp),%eax      [breakpoint]
0xf7b6a291 <get_cpuid+33>       push   %rbx
0xf7b6a292 <get_cpuid+34>       cpuid
0xf7b6a294 <get_cpuid+36>       mov    %ebx,%esi
0xf7b6a296 <get_cpuid+38>       pop    %rbx
0xf7b6a297 <get_cpuid+39>       mov    -0x8(%ebp),%edi
0xf7b6a29b <get_cpuid+43>       mov    %eax,(%edi)          [segfault]
0xf7b6a29e <get_cpuid+46>       mov    -0xc(%ebp),%eax
0xf7b6a2a2 <get_cpuid+50>       mov    %esi,(%eax)
0xf7b6a2a5 <get_cpuid+53>       mov    -0x10(%ebp),%eax
0xf7b6a2a9 <get_cpuid+57>       mov    %ecx,(%eax)
0xf7b6a2ac <get_cpuid+60>       mov    -0x14(%ebp),%eax

"[breakpoint]" marks the place where the execution stops if you set a breakpoint with "break get_cpuid". "[segfault]" marks the place where the crash happens.

Before the breakpoint there's the code that copies the function parameters to the stack as follows:

%edi is the "op" parameter. It's saved to -0x4(%ebp).
%rcx is the "c" parameter. It's moved to %rax and from %rax to -0x10(%ebp).
%r8 is the "d" parameter. It's moved to %rcx and from %rcx to -0x14(%ebp).
%esi is the "a" parameter. It's saved to -0x8(%ebp).
%edx is the "b" parameter. It's saved to -0xc(%ebp).

The stack pointer is not updated when the parameters are saved to the stack. Since the stack pointer points to the beginning of the frame, the push instruction overwrites 8 bytes from the beginning of the frame, overwriting the "op" and "a" parameters.

I think the push is done, because the %rbx register is special in that it must always have the same value when returning from a function as it had when the function started. The cpuid instruction modifies the %rbx register, so that's why we need to save and restore the %rbx register.

After the pop, this happens:

0xf7b6a297 <get_cpuid+39>       mov    -0x8(%ebp),%edi

This reads the stack from the position where the "a" parameter was saved. The compiler seems to assume that it has the same value that was written there in the beginning of the function, but the push instruction has written some random garbage there.

0xf7b6a29b <get_cpuid+43>       mov    %eax,(%edi)          [segfault]

This is supposed to save the return value (well, one part of the return value) of the cpuid instruction to the address stored in %edi, but we just wrote garbage to %edi, so we end up dereferencing using garbage as the pointer (in my tests the value in %edi was 1).
Comment 15 EoD 2017-11-08 23:39:23 UTC
(In reply to Tanu Kaskinen from comment #12)
> I guess I'll have to prepare a bug report for GCC next (I'm not sure if it's
> a compiler bug, but hopefully they'll tell me).

Could you link the bug report by any chance?
Comment 16 Tanu Kaskinen 2017-11-09 12:21:47 UTC
(In reply to EoD from comment #15)
> Could you link the bug report by any chance?

Sorry, I never filed that bug, because I got distracted by other stuff. Would you be able to report the bug?
Comment 17 EoD 2017-11-09 12:31:28 UTC
(In reply to Tanu Kaskinen from comment #16)
> (In reply to EoD from comment #15)
> > Could you link the bug report by any chance?
> 
> Sorry, I never filed that bug, because I got distracted by other stuff.
> Would you be able to report the bug?

Sure, but are you able to reproduce the issue with GCC 5.5, 6.4 or even 7.2? 
I cna just copy&paste the problem with get_cpuid() here, but it might be easier if I had a way to reproduce the issue here. How did you create/read the asm?
Comment 18 Tanu Kaskinen 2017-11-09 13:12:25 UTC
It looks like I've deleted the x32 chroot directory, so I can't check what the GCC version was in January. The distribution was Debian testing or Debian unstable.

Getting a simple test program for reproducing would be good indeed. The asm that I provided is for pulseaudio's get_cpuid() function, which should be reasonably simple to isolate into a test program.

I got the asm code with gdb:

$ gdb pulseaudio
(gdb) break get_cpuid
Function "get_cpuid" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (get_cpuid) pending.
(gdb) run
Breakpoint 1, get_cpuid (op=0, a=0x7fffffffddcc, b=0x7fffffffddd8, c=0x7fffffffddd4, d=0x7fffffffddd0) at pulsecore/cpu-x86.c:33
33	    __asm__ __volatile__ (
(gdb) layout asm

That will open a listing of assembly code in the top half of the gdb terminal. You can move up and down in the listing with arrow keys.
Comment 19 EoD 2017-11-09 23:14:08 UTC
(In reply to Tanu Kaskinen from comment #16)
> 
> Sorry, I never filed that bug, because I got distracted by other stuff.
> Would you be able to report the bug?
I reported it and it has been "RESOLVED INVALID" shortly after. 

In order to stay on the actual ORC issue here, I think it might be best to move the get_cpuid discussion to bug 103656


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.