Bug 98213 - [skl] fbc clobbers stolen
Summary: [skl] fbc clobbers stolen
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other Linux (All)
: highest blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords: bisected, regression
Depends on:
Blocks:
 
Reported: 2016-10-12 12:15 UTC by Greg White
Modified: 2016-12-13 08:12 UTC (History)
2 users (show)

See Also:
i915 platform: SKL
i915 features: display/FBC


Attachments
dmesg (64.25 KB, text/plain)
2016-10-12 12:15 UTC, Greg White
no flags Details
cpu info (9.02 KB, text/plain)
2016-10-12 12:16 UTC, Greg White
no flags Details
error (754.98 KB, text/plain)
2016-10-12 15:39 UTC, Greg White
no flags Details
Updated dmesg (first patch only - still failing) (506.80 KB, text/plain)
2016-10-19 20:20 UTC, Greg White
no flags Details

Description Greg White 2016-10-12 12:15:43 UTC
Created attachment 127238 [details]
dmesg

This is on Skylake/4.9pre-rc1.

Enabling fbc (enable_fbc=1) on skylake causes near-constant GPU hangs/resets.  This is new since 4.8 final.  Booting with enable_fbc=0 or enable_fbc=-1 works.

I understand this is probably unsupported, but thought it might be useful to point this out.

dmesg and cpuinfo attached.  This was right after boot with graphical login started.  The hangs continue with the same signature after my desktop environment starts.
Comment 1 Greg White 2016-10-12 12:16:26 UTC
Created attachment 127239 [details]
cpu info
Comment 2 Chris Wilson 2016-10-12 15:32:55 UTC
I don't think it will help, but it would be interesting at least to see what /sys/class/drm/card0/error says.
Comment 3 Greg White 2016-10-12 15:39:27 UTC
Created attachment 127249 [details]
error

As requested.
Comment 4 Chris Wilson 2016-10-13 12:32:45 UTC
Fwiw, I've applied 

commit 35ca039e0a45fad3b15834059b4ad6697dbc7484
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Oct 13 11:18:14 2016 +0100

    drm/i915: Record the current requests queue for execlists upon hang

to drm-intel-nightly that should give a little more detail into the hang (in the error state) here.
Comment 5 Paulo Zanoni 2016-10-17 16:31:19 UTC
Since this is a regression, it would be really really helpful if you could bisect the regression for us. Can you please do this?

Thanks,
Paulo
Comment 6 Greg White 2016-10-17 21:13:23 UTC
OK, that took a long time, but looks like this is it:

c58b735fc762e891481e92af7124b85cb0a51fce is the first bad commit
commit c58b735fc762e891481e92af7124b85cb0a51fce
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 18 17:16:57 2016 +0100

    drm/i915: Allocate rings from stolen
    
    If we have stolen available, make use of it for ringbuffer allocation.
    Previously this was restricted to !llc platforms, as writing to stolen
    requires a GGTT mapping - but now that we have partial mappable support,
    the mappable aperture isn't quite so precious so we can use it more
    freely and ringbuffers are a good user for the otherwise wasted stolen.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-18-chris@chris-wilson.co.uk

:040000 040000 1a77b1ba168a72106a74ae5ad3f28a49f40a88b0 2cf3076da77f4753071dfb926e391166a8532354 M      drivers
Comment 7 Paulo Zanoni 2016-10-18 20:09:41 UTC
@Chris: how exactly were you able to conclude that it's FBC that clobbers stolena and not the other stolen users? I'm interested in the details so I can investigate more.
Comment 8 Jari Tahvanainen 2016-10-19 08:54:33 UTC
Priority changed as regression w/o workaround
Comment 9 Paulo Zanoni 2016-10-19 20:07:51 UTC
(In reply to Greg White from comment #6)
> OK, that took a long time, but looks like this is it:
> 
> c58b735fc762e891481e92af7124b85cb0a51fce is the first bad commit
> commit c58b735fc762e891481e92af7124b85cb0a51fce
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Aug 18 17:16:57 2016 +0100
> 
>     drm/i915: Allocate rings from stolen
>     
>     If we have stolen available, make use of it for ringbuffer allocation.
>     Previously this was restricted to !llc platforms, as writing to stolen
>     requires a GGTT mapping - but now that we have partial mappable support,
>     the mappable aperture isn't quite so precious so we can use it more
>     freely and ringbuffers are a good user for the otherwise wasted stolen.
>     
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>     Link:
> http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-18-
> chris@chris-wilson.co.uk
> 
> :040000 040000 1a77b1ba168a72106a74ae5ad3f28a49f40a88b0
> 2cf3076da77f4753071dfb926e391166a8532354 M      drivers

Thanks a lot for bisecting this! It's really helpful for us!

Can you please check if the patches located at https://people.freedesktop.org/~pzanoni/bug98213/ solve the problem for you?

It would be really good if you could apply patch 1, then boot with drm.debug=0xe, reproduce the problem and attach dmesg to us.

Also, in case you conclude that applying the 5 patches solves the problem for you, it would be important to discover which one of these patches alone solves the problem.

Thanks!
Paulo
Comment 10 Greg White 2016-10-19 20:19:45 UTC
Done.

Updated dmesg attached.

With the first patch only, it still reproduces.

With all 5 applied, and enabled_fbc=1, everything seems fine.
Comment 11 Greg White 2016-10-19 20:20:28 UTC
Created attachment 127410 [details]
Updated dmesg (first patch only - still failing)
Comment 12 Greg White 2016-10-19 20:28:47 UTC
OK, I ran through the patches 1-4.  1,2,3 alone all failed.  4 alone works.

If you need me to test just 5, let me know.
Comment 13 Paulo Zanoni 2016-10-21 15:57:44 UTC
(In reply to Greg White from comment #12)
> OK, I ran through the patches 1-4.  1,2,3 alone all failed.  4 alone works.
> 
> If you need me to test just 5, let me know.

Thanks a lot for testing this!

Can you please confirm if https://patchwork.freedesktop.org/patch/117250/ fixes the problem for you? This needs to be applied without the other 5 patches I provided.
Comment 14 Greg White 2016-10-21 16:08:53 UTC
OK, I reversed out 004 and applied your new patch.  No problems - works great!
Comment 15 Paulo Zanoni 2016-10-24 20:01:38 UTC
Patch merged to drm-intel-next-queued and marked for inclusion in stable. Closing bug. If the problem persists, please reopen.

Thanks a lot for bisecting and testing the patches!
Comment 16 Jari Tahvanainen 2016-12-13 08:12:13 UTC
closing resolved+fixed. Verified by reporter.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.