Bug 108926 - [CI] igt@runner@aborted - fail - Previous test: pm_rpm (module-reload), check: Corrupted low memory at 000000007cfb63b3 (1000 phys) = 1864a00118649001
Summary: [CI] igt@runner@aborted - fail - Previous test: pm_rpm (module-reload), check...
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-03 13:38 UTC by Martin Peres
Modified: 2018-12-04 11:48 UTC (History)
1 user (show)

See Also:
i915 platform: I915G
i915 features: power/suspend-resume


Attachments
Dmesg during the execution of this test (2.06 MB, text/x-log)
2018-12-03 14:12 UTC, Martin Peres
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-12-03 13:38:28 UTC
We have been very reliably hitting the following error message on gdg:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5186_102/fi-gdg-551/igt@runner@aborted.html

<3>[  308.655471] check: Corrupted low memory at 000000007cfb63b3 (1000 phys) = 1864a00118649001
<3>[  308.655596] check: Corrupted low memory at 00000000492f0651 (1008 phys) = 1864c0011864b001
<3>[  308.655604] check: Corrupted low memory at 00000000fe0b4fbc (1010 phys) = 1864e0011864d001
<3>[  308.655612] check: Corrupted low memory at 00000000b66fdaa9 (1018 phys) = 186500011864f001
<3>[  308.655620] check: Corrupted low memory at 00000000f720d8c9 (1020 phys) = 1865200118651001
<3>[  308.655627] check: Corrupted low memory at 0000000018a04d7a (1028 phys) = 1865400118653001
<3>[  308.655634] check: Corrupted low memory at 00000000d36a3096 (1030 phys) = 1865600118655001
<3>[  308.655642] check: Corrupted low memory at 0000000034e45896 (1038 phys) = 1865800118657001
<3>[  308.655649] check: Corrupted low memory at 0000000019fbe82c (1040 phys) = 1865a00118659001
<3>[  308.655656] check: Corrupted low memory at 000000008509ff77 (1048 phys) = 1865c0011865b001
<3>[  308.655663] check: Corrupted low memory at 000000008b445056 (1050 phys) = 1865e0011865d001
<3>[  308.655671] check: Corrupted low memory at 00000000856cf92c (1058 phys) = 186600011865f001
<3>[  308.655678] check: Corrupted low memory at 0000000094da0e4d (1060 phys) = 1866200118661001
<3>[  308.655685] check: Corrupted low memory at 000000001922f546 (1068 phys) = 1866400118663001
<3>[  308.655693] check: Corrupted low memory at 00000000e71e4205 (1070 phys) = 1866600118665001
<3>[  308.655700] check: Corrupted low memory at 0000000014e7dd31 (1078 phys) = 1866800118667001
<3>[  308.655707] check: Corrupted low memory at 00000000a7bbeab6 (1080 phys) = 1866a00118669001
<3>[  308.655714] check: Corrupted low memory at 0000000060f6520b (1088 phys) = 1866c0011866b001
<3>[  308.655722] check: Corrupted low memory at 00000000f79fdab7 (1090 phys) = 1866e0011866d001
<3>[  308.655729] check: Corrupted low memory at 00000000952b1d17 (1098 phys) = 186700011866f001
<3>[  308.655736] check: Corrupted low memory at 00000000be69864c (10a0 phys) = 1867200118671001
<3>[  308.655743] check: Corrupted low memory at 000000002fef4fad (10a8 phys) = 1867400118673001
<3>[  308.655751] check: Corrupted low memory at 00000000f296e6e8 (10b0 phys) = 1867600118675001
<3>[  308.655758] check: Corrupted low memory at 00000000b560cfea (10b8 phys) = 1867800118677001
<3>[  308.655765] check: Corrupted low memory at 000000002b51c646 (10c0 phys) = 1867a00118679001
<3>[  308.655773] check: Corrupted low memory at 00000000c5732bff (10c8 phys) = 1867c0011867b001
<3>[  308.655780] check: Corrupted low memory at 0000000017d83472 (10d0 phys) = 1867e0011867d001
<3>[  308.655787] check: Corrupted low memory at 00000000dac8bb35 (10d8 phys) = 187000011867f001
<3>[  308.655794] check: Corrupted low memory at 0000000008419ee5 (10e0 phys) = 1870200118701001
<3>[  308.655802] check: Corrupted low memory at 000000001ca88fd9 (10e8 phys) = 1870400118703001
[...]
Comment 1 Martin Peres 2018-12-03 13:41:18 UTC
This was caught in 50% of the runs on CI_DRM_5186.
Comment 4 Tomi Sarvela 2018-12-03 13:52:49 UTC
Oop, the earlier link was wrong.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5186_102/fi-gdg-551/dmesg0.log
Comment 5 Chris Wilson 2018-12-03 13:53:19 UTC
So PTE entries? Starting from 0x18649001. No obvious match for that page range in boot.log though.

I honestly expect it's the BIOS doing strange things after we unload.
Comment 6 Martin Peres 2018-12-03 14:12:58 UTC
Created attachment 142702 [details]
Dmesg during the execution of this test

Also hit with igt@i915_module_load@reload-with-fault-injection: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5186_114/fi-gdg-551/igt@runner@aborted.html

Aborting.
Previous test: i915_module_load (reload-with-fault-injection)
Next test: pm_rpm (module-reload)

Kernel tainted (0x240 -- 200)
Comment 7 Chris Wilson 2018-12-04 11:48:42 UTC
Seems like it's prevalent on drm-intel-fixes runs, but rare enough that it hasn't struck a BAT run?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.