Bug 62373

Summary: [snb rc6] 2.21.3, Linux 3.8.1, SNA: hard lockup when watching videos with MPlayer
Product: DRI Reporter: Michael Stapelberg <michael+freedesktop>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WONTFIX QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: bugs.freedesktop.org, mthode
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: SNB i915 features: power/GT

Description Michael Stapelberg 2013-03-15 15:14:50 UTC
Sometimes, when watching a video with MPlayer, my machine locks up hard so that I have to turn off power before I can use it again in any way. It doesn’t happen all the time, I have not yet found a way to reproduce it.

Unfortunately, I don’t have any logfiles from that. Do you have any suggestions on how to gather logs in such a case?

I’m using an Intel DH67GD mainboard with an Intel Core i7-2600K.
Comment 1 Chris Wilson 2013-03-15 15:21:54 UTC
You can try netconsole, but using snb lockups so hard that even netconsole doesn't capture any dying whimpers.

Are you using mplayer -vo gl or -vo xv?

Have you tried with i915.i915_enable_rc6=0?

Have you tried with Option "SwapbuffersWait" "false"?
Comment 2 Michael Stapelberg 2013-03-15 16:47:29 UTC
(In reply to comment #1)
> Are you using mplayer -vo gl or -vo xv?
I am using -vo xv.

> Have you tried with i915.i915_enable_rc6=0?
I will try that when I reboot the next time (i.e. after the next hard lockup :)).

> Have you tried with Option "SwapbuffersWait" "false"?
I have enabled that option now. I’ll update this report as soon as the lockup occurs the next time, or if it doesn’t occur for a month or so.
Comment 3 Michael Stapelberg 2013-04-03 08:42:13 UTC
(In reply to comment #2)
> > Have you tried with Option "SwapbuffersWait" "false"?
> I have enabled that option now. I’ll update this report as soon as the
> lockup occurs the next time, or if it doesn’t occur for a month or so.
With i915.i915_enable_rc6=0 _and_ Option "SwapbuffersWait" "false" I have not had a single lockup since more than two weeks.

Unfortunately, I won’t have access to the box I have been testing this with for the next 2 months, so any further testing will have to wait.
Comment 4 Chris Wilson 2013-04-03 10:46:39 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > > Have you tried with Option "SwapbuffersWait" "false"?
> > I have enabled that option now. I’ll update this report as soon as the
> > lockup occurs the next time, or if it doesn’t occur for a month or so.
> With i915.i915_enable_rc6=0 _and_ Option "SwapbuffersWait" "false" I have
> not had a single lockup since more than two weeks.

Presumably you still encountered a hard lockup with just Option "SwapbuffersWait" or was a reboot forced in the meantime?
Comment 5 Michael Stapelberg 2013-04-03 11:03:47 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > > Have you tried with Option "SwapbuffersWait" "false"?
> > > I have enabled that option now. I’ll update this report as soon as the
> > > lockup occurs the next time, or if it doesn’t occur for a month or so.
> > With i915.i915_enable_rc6=0 _and_ Option "SwapbuffersWait" "false" I have
> > not had a single lockup since more than two weeks.
> 
> Presumably you still encountered a hard lockup with just Option
> "SwapbuffersWait" or was a reboot forced in the meantime?
Sorry for not being more explicit about this: I have not tried just using SwapbuffersWait extensively, a reboot was indeed forced.
Comment 6 Chris Wilson 2013-06-12 09:31:34 UTC
Can you please try with this patch: https://patchwork.kernel.org/patch/2707341/ as it claims to fix some instability with rc6 on SandyBridge?
Comment 7 Chris Wilson 2013-08-07 08:04:01 UTC
*** Bug 67856 has been marked as a duplicate of this bug. ***
Comment 8 Jani Nikula 2013-12-17 12:08:02 UTC
Timeout.

Michael, are you stills seeing the issue with later kernels? There seems to have been some back and forth with the patch referenced by Chris in comment #6 - please try 3.13-rc1 or later.
Comment 9 Chris Wilson 2013-12-17 12:11:10 UTC
Nothing has changed. SNB can still randomly hard lock with rc6 and vsync.
Comment 10 Rodrigo Vivi 2014-09-24 20:31:05 UTC
Could you please retest with latest drm-intel-nightly?
Comment 11 Jesse Barnes 2014-12-05 20:33:27 UTC
Hm, another option might be to switch to timeout mode on SNB.  With a long enough timeout, we could apply the "emit a primitive" workaround everytime we come out of rc6 and hopefully make things more stable...
Comment 12 Chris Wilson 2014-12-05 20:56:28 UTC
I just saw today that there is a recommendation to toggle PMSI_CTL around WAIT_FOR_EVENT in the SNB bspec. That is probably worth trying...
Comment 13 Chris Wilson 2014-12-08 11:21:46 UTC
Implemented the bspec recommendation:

commit d247cb7d0cdb73736f31612157e47f166af68ba0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Dec 8 10:07:25 2014 +0000

    sna/gen6: Poke PSMI control around WAIT_FOR_EVENT to prevent idling
    
    The bspec recommends preventing the hardware from going to sleep around
    a WAIT_FOR_EVENT, and tells us to use disable sleep bit in PSMI control
    to accomplish this.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=62373
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

It's worth another go...
Comment 14 Chris Wilson 2014-12-10 08:31:18 UTC
*** Bug 87163 has been marked as a duplicate of this bug. ***
Comment 15 Andy Furniss 2014-12-10 14:34:14 UTC
(In reply to Chris Wilson from comment #13)
> Implemented the bspec recommendation:
> 
> commit d247cb7d0cdb73736f31612157e47f166af68ba0
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Dec 8 10:07:25 2014 +0000
> 
>     sna/gen6: Poke PSMI control around WAIT_FOR_EVENT to prevent idling
>     
>     The bspec recommends preventing the hardware from going to sleep around
>     a WAIT_FOR_EVENT, and tells us to use disable sleep bit in PSMI control
>     to accomplish this.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=62373
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> It's worth another go...

OK so not my bug but with a baytrail J1900N this does not prevent a vaapi hard lock (probably when de-interlacing h/w or s/w so fps = refresh) for me.

I only recently started seeing locks - turns out that dri3 was OK for me, but of course it then got disabled by default and I started getting locks.

This is with kodi - it seems they recommend 910 as the last stable driver - and I can lock with 311.

This was tested with head on this commit, kernel nightly and mesa about a week old.
Comment 16 Andy Furniss 2014-12-10 19:48:04 UTC
(In reply to Andy Furniss from comment #15)

> I only recently started seeing locks - turns out that dri3 was OK for me,
> but of course it then got disabled by default and I started getting locks.

It seems I was a bit hasty in calling dri3 OK - I can lock if I try long enough, just that it takes about 20x longer than dri2.
Comment 17 Andy Furniss 2014-12-13 14:45:33 UTC
(In reply to Andy Furniss from comment #16)
> (In reply to Andy Furniss from comment #15)
> 
> > I only recently started seeing locks - turns out that dri3 was OK for me,
> > but of course it then got disabled by default and I started getting locks.
> 
> It seems I was a bit hasty in calling dri3 OK - I can lock if I try long
> enough, just that it takes about 20x longer than dri2.

Looks like the locks were a mesa issue which is now fixed.

I can't point to a commit, but the reason I suspect mesa is that I changed my test case to s/w decode no de-int fps < refresh and could lock/not lock depending on the level of gl output chosen in kodi.

dri2 is still < dri3 - neither lock but 2 glitches occasionally.

Anyway, I guess I am in the wrong bug, sorry for the noise.
Comment 18 Andy Furniss 2015-01-09 16:24:36 UTC
(In reply to Andy Furniss from comment #17)
> (In reply to Andy Furniss from comment #16)
> > (In reply to Andy Furniss from comment #15)
> > 
> > > I only recently started seeing locks - turns out that dri3 was OK for me,
> > > but of course it then got disabled by default and I started getting locks.
> > 
> > It seems I was a bit hasty in calling dri3 OK - I can lock if I try long
> > enough, just that it takes about 20x longer than dri2.
> 
> Looks like the locks were a mesa issue which is now fixed.

Just for completeness - it wasn't mesa, it's just hard to call stable/not when even on unstable runs of 12 hrs are possible.

Current thinking is Kernel > 3.16.x is unstable on baytrail, kodi developers also saying this, so not just me.
Comment 19 Chris Wilson 2015-03-25 17:28:45 UTC
*** Bug 87163 has been marked as a duplicate of this bug. ***
Comment 20 Elio 2015-08-12 18:37:03 UTC
Waiting for feedback in order to change the status.
Comment 21 Elio 2017-02-10 20:17:16 UTC
No feedback, closing

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.