Bug 84744 - ZaphodHeads need O_NONBLOCK support
Summary: ZaphodHeads need O_NONBLOCK support
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-07 10:26 UTC by M. G.
Modified: 2017-07-24 22:51 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.conf (2.12 KB, text/plain)
2014-10-07 10:26 UTC, M. G.
no flags Details
Xorg.0.log (16.64 KB, text/plain)
2014-10-07 10:26 UTC, M. G.
no flags Details
dmesg (85.93 KB, text/plain)
2014-10-07 10:27 UTC, M. G.
no flags Details
Xorg.0.log (driver compiled with --enable-debug=full) (382.98 KB, application/octet-stream)
2014-10-07 11:05 UTC, M. G.
no flags Details
dmesg with drm.debug=7 (250.80 KB, text/plain)
2014-10-07 11:05 UTC, M. G.
no flags Details
syslog (zipped) (620.83 KB, application/octet-stream)
2014-10-07 11:27 UTC, M. G.
no flags Details
Xorg.0.log with debug output (1.39 MB, application/octet-stream)
2014-10-07 12:57 UTC, M. G.
no flags Details
O_NONBLOCK support on /dev/dri/cardN (1.12 KB, patch)
2014-10-07 13:09 UTC, Chris Wilson
no flags Details | Splinter Review
Xorg.0.log (crash) (303.67 KB, application/octet-stream)
2014-10-07 13:52 UTC, M. G.
no flags Details

Description M. G. 2014-10-07 10:26:04 UTC
Created attachment 107471 [details]
xorg.conf

I have an Intel Celeron G1840 running on an ASUS H87M-PRO motherboard. I have connected a monitor using DisplayPort and my TV using HDMI to the PC and have configured them as two separate screens (see xorg.conf).
When I play a video using VA-API (mpv --hwdec=vaapi -vo=vaapi video.mp4) the X-Server freezes after the first frame (or maybe some more) and it is only possible to kill the X-Server via SSH. There is no error logged in /sys/kernel/debug/dri/0/i915_error_state.
Sometimes the video plays fine (about 1 in 10-20 times). However, after some seeking forward/back the X-Server also freezes.

The freeze happens with xf86-video-intel-2.99.916 and the latest Git checkout. I also tested some older releases.
I am using xorg-server-1.15.0, mesa-10.0.4 and libdrm-2.4.52, but also tried the latest ~amd64 versions which Gentoo provides.
I am currently using libva 1.4.1.pre1 (1.3.2-70-g57c86a4), but also tested the releases 1.3.0 and 1.3.2.
The problem occurs with kernel 3.14.14-gentoo and linux-3.16.3-gentoo.
Comment 1 M. G. 2014-10-07 10:26:41 UTC
Created attachment 107472 [details]
Xorg.0.log
Comment 2 M. G. 2014-10-07 10:27:11 UTC
Created attachment 107473 [details]
dmesg
Comment 3 Chris Wilson 2014-10-07 10:41:15 UTC
It will be blocking in the kernel somewhere, perhaps cat /proc/`pidof Xorg`/stack or compile with --enable-debug=full and use drm.debug=7.
Comment 4 Chris Wilson 2014-10-07 10:44:32 UTC
If it was just the video that freezed, it would be a stuck pageflip. (It's likely that anyway...)
Comment 5 M. G. 2014-10-07 11:05:09 UTC
Created attachment 107480 [details]
Xorg.0.log (driver compiled with --enable-debug=full)

Log file is zipped due to large file size.
Comment 6 M. G. 2014-10-07 11:05:45 UTC
Created attachment 107481 [details]
dmesg with drm.debug=7
Comment 7 Chris Wilson 2014-10-07 11:13:04 UTC
The log stops just after X wakes up, but before it blocks again on new input. Notably it stops outside of the ddx. At least the two logs are consistent with each other...

Could you attach gdb to X during the freeze and step through to see what it is stuck on?
Comment 8 M. G. 2014-10-07 11:26:20 UTC
Unfortunately I have no /proc/`pidof Xorg`/stack file. Can you tell me how to enable it? I have set CONFIG_STACKTRACE_SUPPORT=y in my kernel config.

After uploading the above files with my second PC, I switched the monitor input back and saw that the video moved on a few frames but the video was still stuck. After moving the mouse the video playback continued fine. I will attach the syslog. Does this help?
Comment 9 M. G. 2014-10-07 11:27:45 UTC
Created attachment 107483 [details]
syslog (zipped)
Comment 10 Chris Wilson 2014-10-07 11:36:44 UTC
(In reply to M. G. from comment #8)
> Unfortunately I have no /proc/`pidof Xorg`/stack file. Can you tell me how
> to enable it? I have set CONFIG_STACKTRACE_SUPPORT=y in my kernel config.

Perhaps your process is called X, not Xorg. You can take a look at "ps ax" and manually grab the pid to use above.
 
> After uploading the above files with my second PC, I switched the monitor
> input back and saw that the video moved on a few frames but the video was
> still stuck. After moving the mouse the video playback continued fine. I
> will attach the syslog. Does this help?

Indeed. That sort of ties in with it not appearing to freeze inside the graphics driver... Could you try finding the right /proc/<pid>/stack and/or attaching gdb and getting a backtrace?

Time to read syslog...
Comment 11 M. G. 2014-10-07 12:53:18 UTC
(In reply to Chris Wilson from comment #10)
> (In reply to M. G. from comment #8)
> > Unfortunately I have no /proc/`pidof Xorg`/stack file. Can you tell me how
> > to enable it? I have set CONFIG_STACKTRACE_SUPPORT=y in my kernel config.
> 
> Perhaps your process is called X, not Xorg. You can take a look at "ps ax"
> and manually grab the pid to use above.

OK, I figured out why the file was missing. I had to recompile my kernel with Kernel hacking/Kernel debugging and Kernel hacking/Tracers enabled.

> > After uploading the above files with my second PC, I switched the monitor
> > input back and saw that the video moved on a few frames but the video was
> > still stuck. After moving the mouse the video playback continued fine. I
> > will attach the syslog. Does this help?
> 
> Indeed. That sort of ties in with it not appearing to freeze inside the
> graphics driver... Could you try finding the right /proc/<pid>/stack and/or
> attaching gdb and getting a backtrace?

Here is the stacktrace output:

[<ffffffff813a20b2>] drm_read+0x162/0x1c0
[<ffffffff811481fb>] vfs_read+0x9b/0x190
[<ffffffff8114849a>] SyS_read+0x4a/0xc0
[<ffffffff815cd812>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

I haven't used gdb before. If you need a backtrace it might take a while because I first have to find out what to do in order to get a backtrace.

> Time to read syslog...

I think this is the relevant part:

[  234.782295] [drm:drm_calc_vbltimestamp_from_scanoutpos], crtc 0 : v 7 p(0,-34)@ 234.770961 -> 234.771420 [e 0 us, 0 rep]
[  234.795367] [drm:vblank_disable_fn], disabling vblank on crtc 0
[  234.795383] [drm:drm_calc_vbltimestamp_from_scanoutpos], crtc 0 : v 5 p(0,935)@ 234.784052 -> 234.771424 [e 0 us, 0 rep]
[  270.745528] [drm:drm_ioctl], pid=1841, dev=0xe200, auth=1, I915_GEM_BUSY
[  270.745571] [drm:drm_ioctl], pid=1841, dev=0xe200, auth=1, I915_GEM_BUSY
[  270.745591] [drm:drm_ioctl], pid=1841, dev=0xe200, auth=1, I915_GEM_BUSY
[  270.745800] [drm:drm_ioctl], pid=1841, dev=0xe200, auth=1, I915_GEM_THROTTLE
[  270.746352] [drm:drm_ioctl], pid=1841, dev=0xe200, auth=1, DRM_IOCTL_WAIT_VBLANK
[  270.746362] [drm:drm_vblank_get], enabling vblank on crtc 0, ret: 0                                                                                                                                             
[  270.746375] [drm:drm_calc_vbltimestamp_from_scanoutpos], crtc 0 : v 5 p(0,1183)@ 270.734067 -> 270.718089 [e 0 us, 0 rep]                                                                                       
[  270.746389] [drm:drm_update_vblank_count], enabling vblank interrupts on crtc 0, missed 2155                                                                                                                    
[  270.746398] [drm:drm_wait_vblank], waiting on vblank count 16134, crtc 0                                                                                                                                        
[  270.746401] [drm:drm_wait_vblank], returning 16134 to client

Maybe some vblank issue?
Comment 12 M. G. 2014-10-07 12:57:45 UTC
Created attachment 107491 [details]
Xorg.0.log with debug output

Here's the Xorg.0.log with the debug output from the previous run. Maybe it helps. I had to use 7Zip because the packed Zip file was still too large.
Comment 13 Chris Wilson 2014-10-07 13:09:14 UTC
Oh, ZaphodHeads! Sorry, completely misinterpretted dual-head configuration. Known issue in the drm module.
Comment 14 Chris Wilson 2014-10-07 13:09:49 UTC
Created attachment 107492 [details] [review]
O_NONBLOCK support on /dev/dri/cardN
Comment 15 M. G. 2014-10-07 13:49:41 UTC
(In reply to Chris Wilson from comment #13)
> Oh, ZaphodHeads! Sorry, completely misinterpretted dual-head configuration.
> Known issue in the drm module.

Is dual-head the wrong term for this configuration? Sorry, English is not my native language.

(In reply to Chris Wilson from comment #14)
> Created attachment 107492 [details] [review] [review]
> O_NONBLOCK support on /dev/dri/cardN

Great, many thanks! This fixes the freeze. However, if I play the video on the second screen (DISPLAY=:0.1 mpv --hwdec=vaapi --vo=vaapi video.mp4) the X-Server crashes and I only see a blank screen with a static cursor (not blinking) in the left top corner on both displays. I will attach the Xorg.0.log.
Comment 16 M. G. 2014-10-07 13:52:28 UTC
Created attachment 107495 [details]
Xorg.0.log (crash)

(packed with 7Zip)
Comment 17 Chris Wilson 2014-10-07 15:34:02 UTC
No, it's just that Zaphod is so unusual that I assume multihead to mean multimonitor. Looks like I broke it in around 2.99.912:

commit c481254c17316e6c8299705fd0a218484dd369fe
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 7 16:24:18 2014 +0100

    sna: Retrieve private pointer from vblank cookie
    
    When using ZaphodHeads, we share the /dev/dri/card0 fd between both
    screens. So when we read an event back from the fd, it could be for
    either head and we cannot assume that our private pointer is valid for
    the data passed along with the event. Instead, we need to retreive that
    pointer from the event.
    
    Fixes regression from
    
    commit 8369166349c92a20d9a2e7d0256e63f66fe2682b [2.99.912]
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Wed Jun 4 08:29:51 2014 +0100
    
        sna/dri2: Enable immediate buffer exchanges
    
    although the design bug is actually older.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84744#c15
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 18 M. G. 2014-10-07 16:14:09 UTC
Wow, many thanks for the fast fix! Now everything works as desired. Great!

I noticed that your drm:nonblock patch is already over one year old. How about the chances that the patch will become part of the mainline kernel?

And one last question: Is the TearFree option unsupported with ZaphodHeads? I get a segfault if I have it in my xorg.conf (no difference whether it is in both Device sections or just in one):

(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x42) [0x58f872]
(EE) 1: /usr/bin/X (0x400000+0x193519) [0x593519]
(EE) 2: /lib64/libpthread.so.0 (0x7f13603f1000+0x11250) [0x7f1360402250]
(EE) 3: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f135c54f000+0x6f5bb) [0x7f135c5be5bb]
(EE) 4: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f135c54f000+0x6f4ee) [0x7f135c5be4ee]
(EE) 5: /usr/bin/X (WakeupHandler+0x9c) [0x43f79c]
(EE) 6: /usr/bin/X (WaitForSomething+0x1a4) [0x58d094]
(EE) 7: /usr/bin/X (0x400000+0x3ab81) [0x43ab81]
(EE) 8: /usr/bin/X (0x400000+0x3ebea) [0x43ebea]
(EE) 9: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x7f135f076db5]
(EE) 10: /usr/bin/X (0x400000+0x29641) [0x429641]
(EE) 
(EE) Segmentation fault at address 0x7fff2953f050
Comment 19 Chris Wilson 2014-10-07 16:20:18 UTC
Ah, it will be supported shortly... I have just never tested TearFree with Zaphod...
Comment 20 M. G. 2014-10-07 16:31:43 UTC
OK, if you need a second tester, let me know :) And thanks again!
Comment 21 Chris Wilson 2014-10-07 20:14:23 UTC
Bleh, spotted the breakage I think:

commit 57c48e4973ac0dad09744eaa82315a5f023094e7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 7 21:08:52 2014 +0100

    sna: Fix the TearFree flip handler for the change in argument order
    
    From the last commit c481254c17316e6c8299705fd0a218484dd369fe,
    we need to pass along the private pointer as the flip data as we no
    longer pass it down from the caller.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Ok, haven't tested TearFree+Zaphood yet though.
Comment 22 Chris Wilson 2014-10-08 08:37:03 UTC
Tested TearFree+Zaphod on ilk VGA + HDMI, seems to be functional for minimal testing at least.
Comment 23 Chris Wilson 2014-10-08 11:35:59 UTC
Fingers crossed that Zaphod works again, all that remains is getting that patch into the kernel.
Comment 24 M. G. 2014-10-08 15:21:55 UTC
You're awesome! Now the X-Server doesn't segfault anymore and the tearing seems to be completely gone. I will test this thoroughly and report back if I experience any issues, but my first impression is very good. Thank you very much!
Comment 25 Mika Kuoppala 2014-11-05 16:06:01 UTC
Based on last comment and that all relevant commits seems to be in repos, marking as resolved.

M. G. please reopen or verify.
Comment 26 M. G. 2014-11-05 16:31:29 UTC
The bug in the Xorg driver is fixed but the DRM patch (see comment 14) needs to be pushed into the kernel. It is working for me because I have manually patched my kernel. Sorry for the confusion.
Comment 27 Chris Wilson 2014-11-05 16:34:11 UTC
What Mika forgot to do (which is v.important) was cite the kernel commit:

commit bd008e5b2953186fc0c6633a885ade95e7043800
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 7 14:13:51 2014 +0100

    drm: Implement O_NONBLOCK support on /dev/dri/cardN
    
    The implmentation is simple in the extreme: we only want to wait for
    events if the device was opened in blocking mode, otherwise we grab what
    is available and report an error if there was none.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: dri-devel@lists.freedesktop.org
    Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Testcase: igt/kms_flip/nonblocing_read
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 28 M. G. 2014-11-05 16:48:29 UTC
Oh, great! I couldn't find the commit because I was looking at the wrong Git tree. Sorry.

Again many thanks for your efforts!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.