Bug 66726

Summary: [PNV Regression]*ERROR* conflict detected with stolen region: [0x7f800000 - 0x80000000]
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Jesse Barnes <jbarnes>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: przanoni, xunx.fang, yangweix.shui
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
iomem
none
fixup stolen detection on pnv
none
dump conflicting region
none
dmesg with patch
none
dmesg
none
dump conflicting region v2
none
dmesg
none
debug patch v3
none
dmesg with debug patch v3
none
dmesg with patch 030775/030774
none
iomem with patch 030775/030774 none

Description lu hua 2013-07-09 07:06:52 UTC
Created attachment 82209 [details]
dmesg

System Environment:
--------------------------
Platform:       Pineview
Kernel:         drm-intel-fixes 8bbbb45b2125a28ea1de657e7893a521b44b60c3

Bug detailed description:
-----------------------------
It happens on pineview with drm-intel-fixes kernel, It works well on drm-intel-next-queued kernel.

The first bad commit could be any of:
035dc1e0f9008b48630e02bf0eaa7cc547416d1d
446f8d81ca2d9cefb614e87f2fabcc996a9e4e7e
15fdeefeb69a3a21fb483d1621517c2c8e0cf31a

output:
Using 768 1MiB buffers
Verifying initialisation...done
Cyclic blits, forward...verifying...done
Cyclic blits, backward...verifying...done
Random blits...verifying...done

Reproduce steps:
----------------------------
1.  ./gen3_render_mixed_blits
Comment 1 Daniel Vetter 2013-07-09 07:54:38 UTC
This happens at boot-up, not when running a testcase. It's a new check added with

commit 15fdeefeb69a3a21fb483d1621517c2c8e0cf31a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jul 4 12:28:35 2013 +0100

    drm/i915: Verify that our stolen memory doesn't conflict

Can you please attach the contents of /proc/iomem for this machine?
Comment 2 lu hua 2013-07-09 08:02:24 UTC
Created attachment 82213 [details]
iomem
Comment 3 Chris Wilson 2013-07-09 10:12:41 UTC
In this case, the memory region should be reserved I think by

[    0.766505] pci_bus 0000:00: root bus resource [mem 0x7f700000-0xffffffff]

So it is a genuine busy, and we should try and filter this warning. Or somehow make a child resource of it.
Comment 4 Daniel Vetter 2013-07-09 12:41:43 UTC
(In reply to comment #3)
> In this case, the memory region should be reserved I think by
> 
> [    0.766505] pci_bus 0000:00: root bus resource [mem 0x7f700000-0xffffffff]
> 
> So it is a genuine busy, and we should try and filter this warning. Or
> somehow make a child resource of it.

The following explicit resources are in that range:

  7f700000-7f8fffff : PCI Bus 0000:01
  7f900000-7f900fff : Intel Flush Page
  7f904000-7f907fff : i915 MCHBAR

That feels like a real conflict, or alternatively our flush page setup in intel-gtt.c is totally bogus. I'll attach the patch to rework the stolen detection to check this.
Comment 5 Daniel Vetter 2013-07-09 12:42:41 UTC
Created attachment 82230 [details] [review]
fixup stolen detection on pnv

Please test this patch.
Comment 6 Daniel Vetter 2013-07-09 14:25:07 UTC
Adding Paulo since he reported a conflict on his hsw machine, too.
Comment 7 Daniel Vetter 2013-07-09 14:32:47 UTC
Fyi I've moved the stolen range request_region check from -fixes to dinq. There's simply too much fallout to dump it into -fixes right now. We need to tackle those issues first.
Comment 8 Daniel Vetter 2013-07-09 14:33:58 UTC
Created attachment 82236 [details] [review]
dump conflicting region

Please apply this debug patch to a broken kernel and then please attach dmesg from booting it. Hopefully this explains why the request_region fails.
Comment 9 lu hua 2013-07-10 05:53:40 UTC
Created attachment 82253 [details]
dmesg with patch

Test with the patch.
*ERROR* conflict detected with stolen region: [0x7f800000 - 0x80000000] goes away.

Following error in dmesg:
 dmesg | grep ERROR
[   25.794381] rmmod[3343]: ERROR: Module scsi_wait_scan does not exist in /proc/modules
[   25.821654] rmmod[3348]: ERROR: Module scsi_wait_scan does not exist in /proc/modules
Comment 10 Daniel Vetter 2013-07-10 06:06:56 UTC
(In reply to comment #9)
> Created attachment 82253 [details]
> dmesg with patch
> 
> Test with the patch.
> *ERROR* conflict detected with stolen region: [0x7f800000 - 0x80000000] goes
> away.

Is this really on latest -nightly? Note that I've removed the patch which caused this ERROR output from -fixes, so if you used that as a baseline then you need to retest.
Comment 11 lu hua 2013-07-10 06:39:13 UTC
(In reply to comment #9)
> Created attachment 82253 [details]
> dmesg with patch
> 
> Test with the patch.
> *ERROR* conflict detected with stolen region: [0x7f800000 - 0x80000000] goes
> away.
> 
> Following error in dmesg:
>  dmesg | grep ERROR
> [   25.794381] rmmod[3343]: ERROR: Module scsi_wait_scan does not exist in
> /proc/modules
> [   25.821654] rmmod[3348]: ERROR: Module scsi_wait_scan does not exist in
> /proc/modules

Sorry, Test on -dinq kernel.
Comment 12 lu hua 2013-07-10 06:57:16 UTC
It also happens on SNB.

Test on -nightly branch with the patch, It still exists.

-nightly commit: 88de8dd7d0415bf7a13df6cfdf15a612dcde822f(Merge: 09fa6ed 18097b9).

dmesg -r | egrep "<[1-3]>" |grep drm
<3>[    1.006430] [drm:i915_stolen_to_physical] *ERROR* conflict detected with stolen region: [0xcba00000 - 0xcfa00000]
Comment 13 lu hua 2013-07-10 06:57:40 UTC
Created attachment 82256 [details]
dmesg
Comment 14 Daniel Vetter 2013-07-10 08:57:50 UTC
Created attachment 82259 [details] [review]
dump conflicting region v2

Oops, the debug patch was broken and didn't actually dump the information we're interested in. Can you please apply this updated patch on top of latest -nightly and grab a new dmesg?
Comment 15 lu hua 2013-07-11 07:25:10 UTC
Created attachment 82316 [details]
dmesg

Test with the patch, It still happens.
Comment 16 Daniel Vetter 2013-07-11 09:46:26 UTC
Ignoring that I fail at printf we have

conflict with resource 0xcb000000 - 0xcbffffff: RAM buffer, flags=0x80000000 (IORESOURCE_BUSY)

[drm:i915_stolen_to_physical] *ERROR* conflict detected with stolen region: [0xcba00000 - 0xcfa00000]

But that's now on an snb machine, which makes this bug report massively confusing, since those values don't match up at all for PNV.

Lu Hua can you please file a new bug report for SNB? Please attach debug dmesg with the debug patch (I'll attach a v3 shortly with fixed up output) and /proc/iomem

We need to restrict ourselves to this specific pnv machine here to avoid confusion.
Comment 17 Daniel Vetter 2013-07-11 09:50:20 UTC
Created attachment 82326 [details] [review]
debug patch v3

Please use this patch here to grab a new drm debug dmesg for both this bug report (about pnv) and the new bug report for snb.

Thanks, Daniel
Comment 18 lu hua 2013-07-12 06:14:53 UTC
Created attachment 82360 [details]
dmesg with debug patch v3

Reported Bug 66844 about SNB.
Comment 19 Daniel Vetter 2013-07-12 12:48:05 UTC
This still loosk like a legit conflict ... I guess we should disable stolen if we detect such a case.
Comment 20 Daniel Vetter 2013-07-23 15:57:28 UTC
Jesse Barnes volunteered to write an rfc patch to reserve stolen in early platform code and submit it to the x86 maintainers for feedback.
Comment 21 Jesse Barnes 2013-07-24 17:24:09 UTC
Can you try these two patches and then post your dmesg and /proc/iomem again?

http://lists.freedesktop.org/archives/intel-gfx/2013-July/030775.html
http://lists.freedesktop.org/archives/intel-gfx/2013-July/030774.html
Comment 22 lu hua 2013-07-25 06:10:41 UTC
It doesn't happen on drm-intel-fixes kernel(363202bb22467ea1de6dd2).
Close it.
Comment 23 lu hua 2013-07-25 06:11:10 UTC
Verified.Fixed.
Comment 24 Daniel Vetter 2013-07-25 07:09:18 UTC
The patch which detects stolen conflicts is moved to dinq, and the bug should still be there. Please recheck that and then test the two patches from Jesse.
Comment 25 lu hua 2013-07-25 07:40:01 UTC
(In reply to comment #21)
> Can you try these two patches and then post your dmesg and /proc/iomem again?
> 
> http://lists.freedesktop.org/archives/intel-gfx/2013-July/030775.html
> http://lists.freedesktop.org/archives/intel-gfx/2013-July/030774.html

Fixed by these 2 patches.
Comment 26 lu hua 2013-07-25 07:41:10 UTC
Created attachment 82981 [details]
dmesg with patch 030775/030774
Comment 27 lu hua 2013-07-25 07:41:44 UTC
Created attachment 82982 [details]
iomem with patch 030775/030774
Comment 28 Daniel Vetter 2013-08-29 22:04:28 UTC
Should be fixed on dinq with

commit ee75f952998e8365da90300cf1d421fc6c7dafa2
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri Jul 26 13:32:52 2013 -0700

    x86: add early quirk for reserving Intel graphics stolen memory v5
Comment 29 lu hua 2013-08-30 06:52:10 UTC
Verified.Fixed.
Comment 30 Elizabeth 2017-10-06 14:45:13 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.