Bug 91254

Summary: (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1
Product: Mesa Reporter: Tomasz C. <tlinux>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED FIXED QA Contact: Jordan Justen <jljusten>
Severity: major    
Priority: high CC: chris, frdsktp, idr, lantw44, melko, tlinux, vogelchr
Version: 10.6   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
URL: https://bugs.archlinux.org/task/45459
Whiteboard:
i915 platform: i915 features:

Description Tomasz C. 2015-07-07 08:39:42 UTC
After upgrading to mesa 10.6.0-1 from 10.5.7 under the Intel Graphics, with the work of the VA-API display slows down and freezes.

Additional info:

* package version(s)
mesa 10.6.0-1
mesa-libgl 10.6.0-1

* config and/or log files etc.
System: 4.0.6-1-ck x86_64 (64 bit), (tested also 4.0.5-2), Desktop: KDE (Plasma 5.3)
CPU: Dual core Intel Core i5 M 450 (-HT-MCP-) cache: 3072 KB 
Graphics: Card: Intel Core Processor Integrated Graphics Controller
Display Server: X.Org 1.17.2 driver: intel Resolution: 1920x1080@60.00hz
GLX Renderer: Mesa DRI Intel Ironlake Mobile GLX Version: 2.1 Mesa 10.6.0

The problem is on Intel Core i5 M 450 - first generation (Nehalem) of Intel Core, also tested on the i3-3220T - third generation (Ivy Bridge) and i3-4005U fourth generation (Haswell) and it works properly.
I did not test for second-generation (Sandy Bridge).

Steps to reproduce:

Metod 1
- install mesa 10.6.0-1 and mesa-libgl 10.6.0-1 (or 10.6.1)
- install mpv and configure it:
vo=opengl
hwdec=vaapi
- play any video,

Metod 2
- install mesa 10.6.0-1 and mesa-libgl 10.6.0-1
- install kodi
- enable VA-API (Settings > Video > Acceleration)
- play any video

Symptoms: display slows down and freezes

Tested on:
- xf86-video-intel 1:2.99.917+364+gb24e758-1 and 2.99.917-5
- AccelMethod, SNA, UXA, glamor
- Linux 4.0.6, 4.0.5-2, 4.1.1

On most video files you can see the problem, but not all.
You can test to: Jellyfish Video Bitrate Test Files http://jell.yfish.us/

It helps only downgrade to mesa and mesa-libgl to 10.5.7-1

Does not help downgrade xf86-video-intel to 2.99.917-5, therefore the suspicion that the problem is mesa.

Upgrade libva and libva-intel-driver from 1.5.1 to 1.6.0 does not resolve this bug.

The bug is reported:
https://bugs.archlinux.org/task/45459
https://bbs.archlinux.org/viewtopic.php?id=198982
Comment 1 Chris Wilson 2015-07-07 10:17:23 UTC
I suspect this a dup of bug 90839. Do you see the regression remain on master or the 10.6 branch?
Comment 2 Tomasz C. 2015-07-07 12:05:03 UTC
On:
mesa-git 10.7.0_devel.71031
mesa-libgl-git 10.7.0_devel.71031
(compiled from git master)
this problem still exists same as 10.6 and 10.6.1

If I go back this two packages to version 10.5.7 it works correctly.

How can I help locate the source of the problem?
Comment 3 Chris Wilson 2015-07-07 12:11:46 UTC
You have two end points, a bisection would be very useful and only take a few minutes (maybe an hour at most?).
Comment 4 Tomasz C. 2015-07-08 08:30:42 UTC
I made a test the bisection method, here's the result:

0e0e23ef537c9add672ff322f34e129a07edc55e is the first bad commit
commit 0e0e23ef537c9add672ff322f34e129a07edc55e
Author: Jordan Justen <jordan.l.justen@intel.com>
Date:   Wed Apr 22 11:43:50 2015 -0700

    i965/state: Emit pipeline select when changing pipelines
    
    Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

:040000 040000 725572870b90c40592e3d382faef748469403f67 ae158c5c1fc9c0487dbfad8b338ebdb27300c0a1 M      src


It will be helpful?
Comment 5 Tomasz C. 2015-07-08 09:16:52 UTC
This confirmed yet by compiling 10.6.1 of reverse patch:
https://git.thm.de/tjkl80/mesa/commit/0e0e23ef537c9add672ff322f34e129a07edc55e.patch

It works correctly.
Do you expect the correction in the next version?
Comment 6 melko 2015-07-18 07:30:04 UTC
*** Bug 91343 has been marked as a duplicate of this bug. ***
Comment 7 Jordan Justen 2015-07-21 03:18:44 UTC
(In reply to Tomasz C. from comment #0)
> * config and/or log files etc.
> GLX Renderer: Mesa DRI Intel Ironlake Mobile GLX Version: 2.1 Mesa 10.6.0
> 
> The problem is on Intel Core i5 M 450 - first generation (Nehalem) of Intel
> Core, also tested on the i3-3220T - third generation (Ivy Bridge) and
> i3-4005U fourth generation (Haswell) and it works properly.
> I did not test for second-generation (Sandy Bridge).

To confirm, the issue only happens on Ironlake for you?

I was unable to reproduce on GM45 or Ivy Bridge using
10.6.0 or master.

I tried the 3, 5, 10 and 70 Mbps versions, with
this command line:

mpv --vo=opengl --hwdec=vaapi Jellyfish-3-Mbps.mkv
Comment 8 Tomasz C. 2015-07-21 11:24:27 UTC
> To confirm, the issue only happens on Ironlake for you?
> 
> I was unable to reproduce on GM45 or Ivy Bridge using
> 10.6.0 or master.
> 
> I tried the 3, 5, 10 and 70 Mbps versions, with
> this command line:
> 
> mpv --vo=opengl --hwdec=vaapi Jellyfish-3-Mbps.mkv

Yes, the issue only happens on Intel Core i5 M 450. This is probably Ironlake Graphics.
On other processors listed by me everything is working properly.

With the bug report No. 91343 appears that the issue also occurs on an Intel Core i5 M 460. It also probably is Ironlake Graphics.
Comment 9 melko 2015-07-21 13:58:37 UTC
(In reply to Tomasz C. from comment #8)
> 
> Yes, the issue only happens on Intel Core i5 M 450. This is probably
> Ironlake Graphics.
> On other processors listed by me everything is working properly.
> 
> With the bug report No. 91343 appears that the issue also occurs on an Intel
> Core i5 M 460. It also probably is Ironlake Graphics.

yeah i5 M 460 is Ironlake too, this is what vainfo states:
vainfo: Driver version: Intel i965 driver for Intel(R) Ironlake Mobile - 1.5.1

Sadly I don't have any other CPU to test.
Comment 10 dimon 2015-08-03 14:40:28 UTC
Can someone please look into this issue.
On every start of mpv my system freezes for some seconds and every instance of the flashplugin crashes in my browser.
I'm always using mpv for watching flashvideos since it hast vaapi support, so this bug is really annoying.

I'm on Ironlake.
Comment 11 Tomasz C. 2015-08-13 05:04:51 UTC
In 10.6.4 this bug is still present
Can not you just go back patch that causes the problem?
Comment 12 Tomasz C. 2015-08-23 04:38:40 UTC
In 10.6.5 this bug is still present.

I spent a few hours on it to find the commit that causes a problem. You have exactly indicated commit, and five versions are not able to fix the bug? Or maybe you do not care about the old card users, if so, is just tell.
Comment 13 Jordan Justen 2015-08-23 04:48:11 UTC
(In reply to Tomasz C. from comment #12)
> I spent a few hours on it to find the commit that causes a problem. You have
> exactly indicated commit, and five versions are not able to fix the bug? Or
> maybe you do not care about the old card users, if so, is just tell.

I haven't been able to reproduce it, but like I mentioned,
I didn't try on Ironlake. I don't have one of those systems
right now.

Is there any possibility of getting a backtrace?

I looked again at the patch, and I couldn't determine a
code path that would have an issue. Therefore I can't
think of a patch to let you try out.
Comment 14 Chris Wilson 2015-08-23 08:19:45 UTC
diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c
index e9d9467..2751152 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -878,7 +878,8 @@ brw_upload_invariant_state(struct brw_context *brw)
 {
    const bool is_965 = brw->gen == 4 && !brw->is_g4x;
 
-   brw_select_pipeline(brw, BRW_RENDER_PIPELINE);
+   brw_emit_select_pipeline(brw, BRW_RENDER_PIPELINE);
+   brw->last_pipeline = BRW_RENDER_PIPELINE;
 
    if (brw->gen < 6) {
       /* Disable depth offset clamping. */
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 9de42ce..7577cfc 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -423,9 +423,6 @@ void brw_init_state( struct brw_context *brw )
 {
    struct gl_context *ctx = &brw->ctx;
 
-   /* Force the first brw_select_pipeline to emit pipeline select */
-   brw->last_pipeline = BRW_NUM_PIPELINES;
-
    STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->render_atoms));
    STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->render_atoms));
    STATIC_ASSERT(ARRAY_SIZE(gen7_render_atoms) <=
Comment 15 Jordan Justen 2015-08-23 09:01:29 UTC
(In reply to Chris Wilson from comment #14)
> diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c
> b/src/mesa/drivers/dri/i965/brw_misc_state.c
> index e9d9467..2751152 100644
> --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> @@ -878,7 +878,8 @@ brw_upload_invariant_state(struct brw_context *brw)
>  {
>     const bool is_965 = brw->gen == 4 && !brw->is_g4x;
>  
> -   brw_select_pipeline(brw, BRW_RENDER_PIPELINE);
> +   brw_emit_select_pipeline(brw, BRW_RENDER_PIPELINE);
> +   brw->last_pipeline = BRW_RENDER_PIPELINE;
>  
>     if (brw->gen < 6) {
>        /* Disable depth offset clamping. */
> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c
> b/src/mesa/drivers/dri/i965/brw_state_upload.c
> index 9de42ce..7577cfc 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> @@ -423,9 +423,6 @@ void brw_init_state( struct brw_context *brw )
>  {
>     struct gl_context *ctx = &brw->ctx;
>  
> -   /* Force the first brw_select_pipeline to emit pipeline select */
> -   brw->last_pipeline = BRW_NUM_PIPELINES;
> -
>     STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->render_atoms));
>     STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->render_atoms));
>     STATIC_ASSERT(ARRAY_SIZE(gen7_render_atoms) <=

brw_init_state calls brw_upload_initial_gpu_state
which calls brw_upload_invariant_state, which calls
brw_select_pipeline(brw, BRW_RENDER_PIPELINE), so
this should be the same, right?
Comment 16 dimon 2015-08-23 13:22:36 UTC
I'm happy to see something happening here.
This might be helpful for reproducing this bug.

I don't use a compositing window manager. So for me 'mpv --vo=opengl --hwdec=vaapi' doesn't hang.
But if I first start any gl-program e.g. glxgears and then start mpv --vo=opengl --hwdec=vaapi my system freezes for a few seconds and glxgears crashes with this error "intel_do_flush_locked failed: Input/output error"
Comment 17 Chris Wilson 2015-08-23 17:21:48 UTC
*** Bug 91382 has been marked as a duplicate of this bug. ***
Comment 18 Tomasz C. 2015-08-24 04:47:29 UTC
Chris patch of comment # 14 is working properly.
I hope that it will enter into the next release.

I am happy that this has been resolved.
Thanks to Chris for helping to fix this bug.
Comment 19 Chris Wilson 2015-08-24 08:01:42 UTC
commit 4e5752e2b78243a71766538f62ca0a80488047a7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Aug 23 09:24:57 2015 +0100

    i965: Always re-emit the pipeline select during invariant state emission
    
    On the older platforms where we don't have logical contexts preserving
    state across batches, we emit the invariant state setup on every batch
    using the brw_invariant_state atom. This includes the pipeline selection
    which is cached with the introduction of
    
    commit 0e0e23ef537c9add672ff322f34e129a07edc55e
    Author: Jordan Justen <jordan.l.justen@intel.com>
    Date:   Wed Apr 22 11:43:50 2015 -0700
    
        i965/state: Emit pipeline select when changing pipelines
    
    However, we do not reset the cache between batches on context-less
    platforms resulting in us not setting the pipeline selection and can
    cause GPU hangs if a media pipelined was loaded in the meantime (e.g.
    mixing mplayer/gstreamer using libva and gnome-shell). A simple solution
    is to just forcibly re-emit the pipeline select along with the invariant
    state and reset the cache at that point.
    
    Reported-and-tested-by: Tomasz C. <tomaszc@o2.pl>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91254
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Jordan Justen <jordan.l.justen@intel.com>
    Cc: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
    Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Comment 20 Chris Wilson 2015-08-24 12:22:36 UTC
*** Bug 91651 has been marked as a duplicate of this bug. ***
Comment 21 Rhys Kidd 2015-08-29 04:04:48 UTC
Confirmed as resolved for me on IronLake with Mesa git. Thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.