Summary: | [IGT] gem_shrink subtest mmap-gtt oom | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Hector Velazquez <hector.franciscox.velazquez.suriano> | ||||||||
Component: | DRM/Intel | Assignee: | Francesco Balestrieri <francesco.balestrieri> | ||||||||
Status: | CLOSED INVALID | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||
Severity: | major | ||||||||||
Priority: | high | CC: | intel-gfx-bugs | ||||||||
Version: | DRI git | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | ReadyForDev | ||||||||||
i915 platform: | CFL | i915 features: | GEM/Other | ||||||||
Bug Depends on: | 101857 | ||||||||||
Bug Blocks: | |||||||||||
Attachments: |
|
This tests was failing on CFL QA igt@gem_shrink@mmap-gtt ==================================================== output ==================================================== IGT-Version: 1.20-gc0be331 (x86_64) (Linux: 4.15.0-rc4-drm-intel-qa-ww51-commit-bf5cdf9+ x86_64) (gem_shrink:2116) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation() (gem_shrink:2116) intel-chipset-DEBUG: Test requirement passed: pci_dev Using 125 processes and 128MiB per process (gem_shrink:2116) intel-os-DEBUG: Checking 125 surfaces of size 134217728 bytes (total 16777281536) against RAM + swap (gem_shrink:2116) drmtest-DEBUG: Test requirement passed: !(fd<0) (gem_shrink:2116) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_shrink:2116) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' (gem_shrink:2116) intel-os-DEBUG: Test requirement passed: __intel_check_memory(count, size, mode, &required, &total) (gem_shrink:2116) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation() (gem_shrink:2116) drmtest-DEBUG: Test requirement passed: !(fd<0) (gem_shrink:2116) drmtest-DEBUG: Test requirement passed: is_i915_device(fd) && has_known_intel_chipset(fd) (gem_shrink:2116) ioctl-wrappers-DEBUG: Test requirement passed: err == 0 (gem_shrink:2116) DEBUG: Test requirement passed: nengine (gem_shrink:2116) igt-core-DEBUG: Starting subtest: mmap-gtt Subtest mmap-gtt failed. No log. child 70 died with signal 9, Killed Subtest mmap-gtt: FAIL (142.110s) (gem_shrink:2116) igt-core-DEBUG: Exiting with status code 137 (gem_shrink:2116) igt-debugfs-DEBUG: Opening debugfs directory '/sys/kernel/debug/dri/0' This is my configuration: ====================================== Graphic stack ====================================== Component: drm tag: libdrm-2.4.88-42-g831036a commit: 831036a6f62005da9fb4a75fe043bd96ce672d27 Component: cairo tag: 1.15.8-73-g903b0de commit: 903b0de539844c144c63ea57c30e84a23360c290 Component: intel-gpu-tools tag: intel-gpu-tools-1.20-232-gc0be331 commit: c0be3310715e2f744b892c51f09e62273bcc8e57 Component: piglit tag: piglit-v1 commit: 64775cc0f59820c4d733e480a66f8c31f5b78d1b ====================================== Software ====================================== kernel version : 4.15.0-rc4-drm-intel-qa-ww51-commit-bf5cdf9+ hostname : CFL-1 architecture : x86_64 os version : Ubuntu 16.10 os codename : yakkety kernel driver : i915 bios revision : 104.3 bios release date : 09/14/2017 ksc : 1.5 hardware acceleration : disabled swap partition : enabled on (/dev/nvme0n1p3) ====================================== Graphic drivers ====================================== libdrm : 2.4.89 cairo : 1.15.11 intel-gpu-tools (tag) : intel-gpu-tools-1.20-232-gc0be331 intel-gpu-tools (commit) : c0be331 ====================================== Hardware ====================================== motherboard model : CoffeeLakeClientPlatform motherboard id : CoffeeLakeSUDIMMRVP form factor : Desktop manufacturer : IntelCorporation cpu family : Other cpu family id : 6 cpu information : Genuine Intel(R) CPU 0000 @ 3.60GHz gpu card : Intel Corporation Device 3e92 (prog-if 00 [VGA controller]) memory ram : 15.58 GB max memory ram : 32 GB cpu thread : 12 cpu core : 6 cpu model : 158 cpu stepping : 10 socket : Other current cd clock frequency : 337500 kHz maximum cd clock frequency : 675000 kHz displays connected : eDP-1 DP-1 ====================================== Firmware ====================================== dmc fw loaded : yes dmc version : 1.4 guc fw loaded : fetch SUCCESS, load SUCCESS guc version wanted : wanted 9.39, found 9.39 guc version found : wanted 9.39, found 9.39 ====================================== kernel parameters ====================================== quiet drm.debug=0x1e i915.enable_guc=-1 i915.alpha_support=1 auto panic=1 nmi_watchdog=panic intel_iommu=igfx_off resume=/dev/nvme0n1p3 fastboot Created attachment 136259 [details]
dmesg -w
[ 3565.670199] [IGT] gem_shrink: starting subtest mmap-gtt [ 3654.707839] Purging GPU memory, 0 pages freed, 3770498 pages still pinned. [ 3654.707841] 31 and 0 pages still available in the bound and unbound GPU page lists. [ 3 Quite patently there was a log, and the SIGKILL is due to the oom. Please keep the summary a summary of the bug and not nonsense. My bad. Got it. The problem here is that we end up with all 128 threads hitting the reclaim logic; each threading pinning the object it has faulted. Only one thread can make any progress through the oom-logic, but it can't make progress unless it is the one holding struct_mutex. Ergo it reports failure and the oom-killer proceeds without mercy. There is certainly no quick fix for this. This tests continue failing on GLK QA Tests List: igt@gem_shrink@mmap-gtt IGT-Version: 1.21-ga2664f8 (x86_64) (Linux: 4.16.0-rc2-drm-tip-ww9-commit-3a86cab+ x86_64) First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug. Created attachment 138866 [details] dmesg_shrink (In reply to Jani Saarinen from comment #8) > ... > still valid or not... (In reply to Chris Wilson from comment #6) > ... > There is certainly no quick fix for this. This test still takes an eternity to stop and the "killed process" keep happening: [ +0.000000] Out of memory: Kill process 1288 (gem_shrink) score 1000 or sacrifice child [ +0.000007] Killed process 1288 (gem_shrink) total-vm:185620kB, anon-rss:396kB, file-rss:4kB, shmem-rss:0kB [ +18.996521] systemd-journald[319]: /dev/kmsg buffer overrun, some messages lost. [Apr13 18:18] Purging GPU memory, 0 pages freed, 1114112 pages still pinned. This test is not valid anymore. Closing this bug as INVALID. Closing now. Feel free to reopen if you still have the issue. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 136258 [details] otuput