Bug 61708

Summary: [SNB/SNA] MPlayer crashes Xserver
Product: xorg Reporter: JS <js314592>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED NOTOURBUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.log
none
Xorg.log with debug none

Description JS 2013-03-02 18:29:50 UTC
after kernel upgrade from 3.7.9 to 3.8.1
MPlayer with -vo gl output crashes Xserver if SNA is enabled

[   147.303] (EE)
[   147.303] (EE) Backtrace:
[   147.303] (EE) 0: /usr/bin/X (xorg_backtrace+0x47) [0x19608d86d7]
[   147.303] (EE) 1: /usr/bin/X (0x1960707000+0x1d5c89) [0x19608dcc89]
[   147.303] (EE) 2: /lib64/libpthread.so.0 (0x317a1657000+0xfce0) [0x317a1666ce0]
[   147.303] (EE) 3: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x3179eda3000+0xbf149) [0x3179ee62149]
[   147.303] (EE) 4: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x3179eda3000+0xbfecc) [0x3179ee62ecc]
[   147.303] (EE) 5: /usr/bin/X (DRI2SwapBuffers+0x27d) [0x19608a7ccd]
[   147.303] (EE) 6: /usr/bin/X (0x1960707000+0x1a20e4) [0x19608a90e4]
[   147.303] (EE) 7: /usr/bin/X (0x1960707000+0x56c81) [0x196075dc81]
[   147.303] (EE) 8: /usr/bin/X (0x1960707000+0x4368c) [0x196074a68c]
[   147.303] (EE) 9: /lib64/libc.so.6 (__libc_start_main+0xed) [0x317a02853bd]
[   147.303] (EE) 10: /usr/bin/X (0x1960707000+0x439f9) [0x196074a9f9]
[   147.303] (EE)
[   147.303] (EE) Segmentation fault at address 0x64

MPlayer SVN
xorg-server 1.13.2
xf86-video-intel 2.21.3
GT1
XFCE without compositing
Comment 1 Chris Wilson 2013-03-02 20:44:04 UTC
I need 'addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so 0xbf149 0xbfecc'
Comment 2 JS 2013-03-02 22:16:26 UTC
addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so 0xbf149 0xbfecc
xf86-video-intel-2.21.3/src/sna/kgem.h:347
xf86-video-intel-2.21.3/src/sna/sna_dri.c:1858

addr2line -i -f -e /usr/lib64/xorg/modules/drivers/intel_drv.so 0xbf149 0xbfecc
kgem_bo_reference
xf86-video-intel-2.21.3/src/sna/kgem.h:347
sna_dri_copy_to_front
xf86-video-intel-2.21.3/src/sna/sna_dri.c:729
sna_dri_immediate_blit
xf86-video-intel-2.21.3/src/sna/sna_dri.c:1858
sna_dri_schedule_swap
xf86-video-intel-2.21.3/src/sna/sna_dri.c:2187
Comment 3 Chris Wilson 2013-03-02 22:20:33 UTC
Hmm, that's quite scary. How frequently does this occur?
Comment 4 Chris Wilson 2013-03-02 22:21:22 UTC
Can you also attach Xorg.0.log so that I have the full hardware info?
Comment 5 JS 2013-03-02 22:35:28 UTC
Created attachment 75802 [details]
Xorg.log

I can reproduce with 100% reliability with kernel 3.8.1. It never happened with kernel 3.7.9.
Comment 6 Chris Wilson 2013-03-02 22:51:37 UTC
As you can trigger this bug trivially, do you mind recompiling xf86-video-intel with --enable-debug=full and attaching the full Xorg.log? I can see how it can explode, but I can't exactly see how it gets into that situation.
Comment 7 JS 2013-03-02 23:20:35 UTC
Created attachment 75807 [details]
Xorg.log with debug
Comment 8 Chris Wilson 2013-03-03 00:01:59 UTC
Ok, that explains the missing piece of the puzzle. But opens up another question, are you manipulating capabilities on that system? The root cause of the error is that when we try to perform the vsync'ed update, we get rejected with EPERM despite being ostensibly the DRM_MASTER and CAP_SYS_ADMIN.
Comment 9 JS 2013-03-03 00:06:36 UTC
I have patched Xserver to drop root privileges after start-up.
Comment 10 Chris Wilson 2013-03-03 00:28:45 UTC
You are dropping privileges after we query whether we can use a particular root-only interface. I can fix the crash, but not the bigger problem without at least a glimpse at your patch.
Comment 11 JS 2013-03-03 00:34:56 UTC
setgid and setuid are called before Dispatch() is called in dix/main.c
Comment 12 JS 2013-03-03 00:45:44 UTC
I have disabled vsync globally
vblank_mode=0
only MPlayer is started with
unset vblank_mode
If I start MPlayer without 
unset vblank_mode
there is no crash.
Comment 13 Chris Wilson 2013-03-03 00:54:36 UTC
So after all the setup is done, and there is no appropriate callback to [re-]evaluate state after dropping privileges. I guess either do the drop earlier though that is always going to be fragile, or introduce a framework to do so and notify the drivers.
Comment 14 JS 2013-03-03 08:40:21 UTC
Can I patch kernel 3.8 to act as kernel 3.7
or driver to act with kernel 3.8 same as with kernel 3.7 ?
Can be added compile or runtime time option to kernel or driver  to revert acting?
Can that root-only interface be modified to work for certain non-root user?
There are devices like /dev/dri/card0, which allows only some users to talk with kernel. Can be something like that used for that  vsync'ed update?
Comment 15 Chris Wilson 2013-03-03 09:16:39 UTC
(In reply to comment #14)
> Can I patch kernel 3.8 to act as kernel 3.7
> or driver to act with kernel 3.8 same as with kernel 3.7 ?

Option "SwapbuffersWait" "false"

> Can be added compile or runtime time option to kernel or driver  to revert
> acting?

It does a runtime check that you are subverting.

> Can that root-only interface be modified to work for certain non-root user?

No, in order to do vsync'ed updates one needs to write to hardware registers. It was such a fragile single purpose interface that we created a general purpose interface so that we could do such ugly hardware fixups from userspace within the command buffers.

> There are devices like /dev/dri/card0, which allows only some users to talk
> with kernel. Can be something like that used for that  vsync'ed update?

It is using GEM to do the updates. You've just broken the code and so get to fix it.
Comment 16 Chris Wilson 2013-03-03 09:34:54 UTC
This should fix the crash:

commit cd313a8d5d1363929bebac83f81e347b4a9e70f1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Mar 2 22:52:15 2013 +0000

    sna/dri: Guard against failed batch submission
    
    Avoid dereferencing a NULL bo if we do not submit a batch for the copy
    operation.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61708
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

But we still lack a way of dropping privileges without confusing the driver.
Comment 17 JS 2013-03-03 11:13:03 UTC
Option "SwapbuffersWait" "false"
disables vsync, isn't it?

It looks like driver or kernel 3.8 are broken if vsync works with kernel 3.7+sna and kernel 3.8+uxa without root and crash.

How can be that runtime check modified to revert to 3.7 acting on kernel 3.8?

I compiled driver from git. Running MPlayer with "unset vblank_mode" few seconds causes huge performance regression.
glxgears runs at 
5100 fps before MPlayer as usually
370 fps after MPlayer - haven't seen such low fps before
Comment 18 Chris Wilson 2013-03-03 11:18:00 UTC
(In reply to comment #17)
> It looks like driver or kernel 3.8 are broken if vsync works with kernel
> 3.7+sna and kernel 3.8+uxa without root and crash.

vsync does not work on 3.7. It requires a feature introduced in 3.8.
Comment 19 Chris Wilson 2013-03-03 11:18:11 UTC
And UXA does not do vsync.
Comment 20 JS 2013-03-03 11:24:56 UTC
there is tearing when I run MPlayer with vblank_mode=0, but with "unset vblank_mode" there is no tearing on kernel 3.7
Comment 21 Chris Wilson 2013-03-03 11:28:27 UTC
I can categorically state that is pure coincidence.
Comment 22 JS 2013-03-03 11:38:36 UTC
Accoring to intel manual page, it is feature of SwapbuffersWait.

If enabled, the calls will avoid tearing by making sure the display scanline is outside of the area to be copied before the copy occurs.
Comment 23 JS 2013-06-18 15:23:39 UTC
I can solve crash by disabling using of secure batches in driver.
Can you add xorg.conf option for disabling secure batches?

MPlayer is using glXSwapBuffers which solves tearing with any kernel.
Comment 24 Chris Wilson 2013-06-18 15:27:55 UTC
(In reply to comment #23)
> I can solve crash by disabling using of secure batches in driver.
> Can you add xorg.conf option for disabling secure batches?

(In reply to comment #22)
> Accoring to intel manual page, it is feature of SwapbuffersWait.

I think you answered your own question.

But the root cause of this issue are your broken patches.
Comment 25 JS 2013-06-18 15:41:49 UTC
I didn't answered my question. 
If secure batches are enabled driver will crash if SwapbuffersWait is enabled.
If secure batches are disabled, I can have SwapbuffersWait enabled and there is neither crash nor tearing.
Can you add xorg.conf option named "SecureBatches" which will be enabled by default?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.