Bug 97621

Summary: [BAT] [SKL] System boots with [drm:i915_stolen_to_physical] *ERROR* conflict detected with stolen region: [0xc6000000 - 0xc8000000] on dmesg
Product: DRI Reporter: Jari Tahvanainen <jari.tahvanainen>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED NOTOURBUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: amanoel, intel-gfx-bugs, noloader, sad_bunny
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: SKL i915 features: GEM/Other
Attachments:
Description Flags
dmesg (boot)
none
Boot dmesg with 0704 (working) BIOS version
none
/proc/iomem on BIOS v. 0704
none
/proc/iomem on BIOS v. 2002
none
dmesg on BIOS v. 0704
none
dmesg on BIOS v. 2002 none

Description Jari Tahvanainen 2016-09-07 07:00:32 UTC
Created attachment 126268 [details]
dmesg (boot)

Dmesg
[drm:i915_stolen_to_physical] *ERROR* conflict detected with stolen region: [0xc6000000 - 0xc8000000]

started to show up CI_DRM/fi-skl-6700k on boot after BIOS update to
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Table at 0xBFED6000.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
	Vendor: American Megatrends Inc.
	Version: 0701
	Release Date: 01/13/2016
	Address: 0xF0000
	Runtime Size: 64 kB
	ROM Size: 16384 kB
	Characteristics:
		PCI is supported
		APM is supported
		BIOS is upgradeable
		BIOS shadowing is allowed
		Boot from CD is supported
		Selectable boot is supported
		BIOS ROM is socketed
		EDD is supported
		5.25"/1.2 MB floppy services are supported (int 13h)
		3.5"/720 kB floppy services are supported (int 13h)
		3.5"/2.88 MB floppy services are supported (int 13h)
		Print screen service is supported (int 5h)
		8042 keyboard services are supported (int 9h)
		Serial services are supported (int 14h)
		Printer services are supported (int 17h)
		ACPI is supported
		USB legacy is supported
		BIOS boot specification is supported
		Targeted content distribution is supported
		UEFI is supported
	BIOS Revision: 5.11

With previous BIOS *ERROR* was not visible on dmesg.
Comment 1 Jari Tahvanainen 2016-09-09 11:22:13 UTC
This failure is also visible when executing <igt-root>/tests/drv_module_reload_basic on SKL having the newest BIOS.
Comment 2 Tomi Sarvela 2016-09-09 14:09:21 UTC
Created attachment 126375 [details]
Boot dmesg with 0704 (working) BIOS version

Dmesg before SKL BIOS update
Comment 3 Chris Wilson 2016-09-09 14:26:36 UTC
So the new BIOS adds a conflicting entry:

[    0.000000] BIOS-e820: [mem 0x00000000c5300000-0x00000000c7ffffff] reserved

with which the kernel manages to zap our

[    0.000000] Reserving Intel graphics memory at 0x00000000c6000000-0x00000000c7ffffff

However. In the old BIOS,

[    0.000000] Reserving Intel graphics memory at 0x00000000c1000000-0x00000000c2ffffff

which was not marked up as reserved inside the e820.

It seems to be e820_reserve_resources_late() is causing the conflict, but our earlier intel_graphics_stolen() should ideally have fixed up the e820 map.
Comment 4 Chris Wilson 2016-09-09 14:39:29 UTC
Try:

diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h
index 9dafe59cf6e2..5b40b4abef35 100644
--- a/arch/x86/include/uapi/asm/e820.h
+++ b/arch/x86/include/uapi/asm/e820.h
@@ -45,6 +45,8 @@
  */
 #define E820_PRAM      12
 
+#define E820_RESERVED_GFX 64
+
 /*
  * reserved RAM used by kernel itself
  * if CONFIG_INTEL_TXT is enabled, memory of this type will be
diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index f306698a4cb4..e84f6afbb578 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -546,7 +546,7 @@ intel_graphics_stolen(int num, int slot, int func,
               &base, &end);
 
        /* Mark this space as reserved */
-       e820_add_region(base, size, E820_RESERVED);
+       e820_add_region(base, size, E820_RESERVED_GFX);
        sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
 }
Comment 5 Jari Tahvanainen 2016-09-12 13:14:53 UTC
(In reply to Chris Wilson from comment #4)
> Try:
> 
> diff --git a/arch/x86/include/uapi/asm/e820.h
> b/arch/x86/include/uapi/asm/e820.h
> index 9dafe59cf6e2..5b40b4abef35 100644
> --- a/arch/x86/include/uapi/asm/e820.h
> +++ b/arch/x86/include/uapi/asm/e820.h
> @@ -45,6 +45,8 @@
>   */
>  #define E820_PRAM      12
>  
> +#define E820_RESERVED_GFX 64
> +
>  /*
>   * reserved RAM used by kernel itself
>   * if CONFIG_INTEL_TXT is enabled, memory of this type will be
> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> index f306698a4cb4..e84f6afbb578 100644
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -546,7 +546,7 @@ intel_graphics_stolen(int num, int slot, int func,
>                &base, &end);
>  
>         /* Mark this space as reserved */
> -       e820_add_region(base, size, E820_RESERVED);
> +       e820_add_region(base, size, E820_RESERVED_GFX);
>         sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
>  }

Did the changes, compiled and rebooted SKL. Outcome is still the same (~conflict still causes *ERROR*) in dmesg:
[    0.000000] Reserving Intel graphics memory at 0x00000000c6000000-0x00000000c7ffffff
...
[    1.861126] [drm:i915_stolen_to_physical] *ERROR* conflict detected with stolen region: [0xc6000000 - 0xc8000000]

BTW: the latest code in drm-intel-nightly has the intel_graphics_stolen modified compared your example:
commit ee0629cfd3c16c716801c84e939ff5db5e23f54d
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date:   Fri Apr 22 13:45:49 2016 +0300
...
-static void __init intel_graphics_stolen(int num, int slot, int func)
+static void __init
+intel_graphics_stolen(int num, int slot, int func,
+                     const struct intel_early_ops *early_ops)
 ...
Comment 6 Jari Tahvanainen 2016-09-14 06:53:25 UTC
Priority+Severity changed due to impact on Patchwork GFX CI Build Verification Tests (aka BAT) --> Dmesg-warn for igt@drv_module_reload_basic on fi-skl-6700k
Comment 7 dog 2016-10-11 18:58:08 UTC
Here is more detail on the specific motherboard showing the issue:

Asus Z170M-PLUS Intel Z170 LGA1151 micro-ATX-motherboard

Dmidecode shows:
Base Board Information
	Manufacturer: ASUSTeK COMPUTER INC.
	Product Name: Z170M-PLUS
Comment 8 dog 2016-10-12 04:55:40 UTC
After checking this motherboard, we've found there is a new BIOS release(v2002, 9/29 update) as the following, can someone try the new BIOS?

There is no sighting report out from ASUS for this issue.

https://www.asus.com/Motherboards/Z170M-PLUS/HelpDesk_Download/

New BIOS:

Version 2002
Description	Z170M-PLUS BIOS 2002
File Size	6.85 Mbytes update 2016/09/29

Download from http://dlcdnet.asus.com/pub/ASUS/mb/LGA1151/Z170M-PLUS/Z170M-PLUS-ASUS-2002.zip
Comment 9 Joonas Lahtinen 2016-10-12 13:00:53 UTC
Created attachment 127242 [details]
/proc/iomem on BIOS v. 0704
Comment 10 Joonas Lahtinen 2016-10-12 13:01:23 UTC
Created attachment 127243 [details]
/proc/iomem on BIOS v. 2002
Comment 11 Joonas Lahtinen 2016-10-12 13:02:03 UTC
Created attachment 127244 [details]
dmesg on BIOS v. 0704
Comment 12 Joonas Lahtinen 2016-10-12 13:02:31 UTC
Created attachment 127245 [details]
dmesg on BIOS v. 2002
Comment 13 Joonas Lahtinen 2016-10-12 13:18:10 UTC
Thanks to Tomi for using DediProg to flash older bios for the dmesg and /proc/iomem dumps. They are on exactly the same kernel and machine.

There's no difference between 1805 and 2002, which brings us to the conclusion that OEM updated the memory mapping between 0704 and 1805 and since then there is a BIOS reservation from PNP 00:07 conflicting with stolen memory area;

$ cat /sys/bus/pnp/devices/00\:07/firmware_node/path
\_SB_.PCI0.PDRC

PNP device 00:07 previously claims the same memory areas (with exact addresses matching) in 0704 and the more recent ones, but there's that additional reservation in iomem (c5400000-c7fffffe : pnp 00:07) which conflicts with stolen.

I would classify this as UEFI vendor bug (could be our VBIOS too), because we have no obvious way of correlating the reservation to stolen memory. The claimed stolen memory resolution happens to be in the middle of the reservation and no way aligned.
Comment 14 yann 2016-10-12 13:32:59 UTC
following comment #13, closing as not our bug
Comment 15 Jari Tahvanainen 2016-11-28 12:09:23 UTC
*** Bug 98750 has been marked as a duplicate of this bug. ***
Comment 16 Jari Tahvanainen 2016-11-28 12:18:11 UTC
*** Bug 98735 has been marked as a duplicate of this bug. ***
Comment 17 Jeffrey Walton 2017-03-10 20:46:34 UTC
Not sure if it changes things... ASUS released an update for the AMI UEFI on the Z170M-E DE board. Its now version 2202, dated 02/15/2017. The issue is still present.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.