Bug 111512

Summary: [snb] GPU HANG in gnome-shell (after swap?)
Product: DRI Reporter: Chris Murphy <bugzilla>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: medium CC: intel-gfx-bugs, lakshminarayana.vudum
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard: Triaged, ReadyForDev
i915 platform: SNB i915 features: GPU hang
Attachments:
Description Flags
dmesg
none
drm card0 error
none
lspci -vvnn
none
dmesg kernel 5.2.11
none
dmesg conventional swap, 5.3.0rc6 none

Description Chris Murphy 2019-08-29 06:00:44 UTC
kernel 5.3.0-0.rc6.git1.1.fc32.x86_64

00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) (prog-if 00 [VGA controller])

I'm getting a page allocation fail message, followed by call trace, followed by a GPU HANG message.

[  846.385033] fmac.local kernel: gnome-shell: page allocation failure: order:0, mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE), nodemask=(null),cpuset=/,mems_allowed=0
Comment 1 Chris Murphy 2019-08-29 06:01:49 UTC
Created attachment 145193 [details]
dmesg
Comment 2 Chris Murphy 2019-08-29 06:02:08 UTC
Created attachment 145194 [details]
drm card0 error
Comment 3 Chris Murphy 2019-08-29 06:04:39 UTC
Created attachment 145195 [details]
lspci -vvnn
Comment 4 Chris Wilson 2019-08-29 10:17:01 UTC
Allocation failure on swap-in is not uncommon when dealing with 5G+ of swap, the kernel struggles to cope and we make more noise than most. That failure does not look to be the cause of the later hang, though it may indeed be related to memory pressure (although being snb it is llc so less susceptible to most forms of corruption, you can still hypothesize data not making it to/from swap that leads to context corruption). I would say the memory layout of the batch supports the hypothesis that the context has been swapped out and back in. So I am going to err on the side of assuming this is an invalid context image due to swap.
Comment 5 Chris Murphy 2019-08-29 16:32:14 UTC
Created attachment 145211 [details]
dmesg kernel 5.2.11

Point of comparison with a different kernel. It looks like the same thing. I guess I just don't see these messages with the non-debug kernels.
Comment 6 Chris Murphy 2019-08-29 16:47:22 UTC
(In reply to Chris Wilson from comment #4)
> Allocation failure on swap-in is not uncommon when dealing with 5G+ of swap,
> the kernel struggles to cope and we make more noise than most.

Interesting. This suggests an incongruence between typical 1:1 RAM swap partition sizes by most distro installers, at least for use cases where there will be heavy pressure on RAM rather than incidental swap usage. In your view, is this a case of, "doctor, it hurts when I do this" and the doctor says, "right, so don't do that" or is there room for improvement?

Note: these examples are unique in that the test system is using swap on ZRAM. So it should be significantly faster than conventional swap on a partition. Also, these examples have /dev/zram0 sized to 1.5X RAM, but it's reproducible at 1:1. In smaller swap cases, I've seen these same call traces far less frequently, and also oom-killer happens more frequently.

> That failure
> does not look to be the cause of the later hang, though it may indeed be
> related to memory pressure (although being snb it is llc so less susceptible
> to most forms of corruption, you can still hypothesize data not making it
> to/from swap that leads to context corruption). I would say the memory
> layout of the batch supports the hypothesis that the context has been
> swapped out and back in. So I am going to err on the side of assuming this
> is an invalid context image due to swap.

The narrow goal of this torture test is to find ways of improving system responsiveness under heavy swap use. And also it acts much like an unprivileged fork bomb that can, somewhat non-deterministically I'm finding, take down the system (totally unresponsive for >30 minutes). And in doing that, I'm stumbling over other issues like this one.

For desktops, it's a problem to not have swap big enough to support hibernation.
Comment 7 Chris Wilson 2019-08-29 17:00:03 UTC
(In reply to Chris Murphy from comment #6)
> (In reply to Chris Wilson from comment #4)
> > Allocation failure on swap-in is not uncommon when dealing with 5G+ of swap,
> > the kernel struggles to cope and we make more noise than most.
> 
> Interesting. This suggests an incongruence between typical 1:1 RAM swap
> partition sizes by most distro installers, at least for use cases where
> there will be heavy pressure on RAM rather than incidental swap usage. In
> your view, is this a case of, "doctor, it hurts when I do this" and the
> doctor says, "right, so don't do that" or is there room for improvement?

It's definitely the kernel's problem in mishandling resources, there are plenty still available, we just aren't getting the pages when they are required, as they are required. Aside from that, we are not prioritising interactive workloads very well under these conditions. From our point of view that only increases the mempressure for graphic resources -- work builds up faster than we can process, write amplification from client to display.

> Note: these examples are unique in that the test system is using swap on
> ZRAM. So it should be significantly faster than conventional swap on a
> partition. Also, these examples have /dev/zram0 sized to 1.5X RAM, but it's
> reproducible at 1:1. In smaller swap cases, I've seen these same call traces
> far less frequently, and also oom-killer happens more frequently.
> 
> > That failure
> > does not look to be the cause of the later hang, though it may indeed be
> > related to memory pressure (although being snb it is llc so less susceptible
> > to most forms of corruption, you can still hypothesize data not making it
> > to/from swap that leads to context corruption). I would say the memory
> > layout of the batch supports the hypothesis that the context has been
> > swapped out and back in. So I am going to err on the side of assuming this
> > is an invalid context image due to swap.
> 
> The narrow goal of this torture test is to find ways of improving system
> responsiveness under heavy swap use. And also it acts much like an
> unprivileged fork bomb that can, somewhat non-deterministically I'm finding,
> take down the system (totally unresponsive for >30 minutes). And in doing
> that, I'm stumbling over other issues like this one.

Yup. Death-by-swap is an old problem (when the oomkiller doesn't kill you, you can die of old age waiting for a response wishing it had). Most of our effort is spent trying to minimise the system-wide impact when running at max memory (when the caches are regularly reaped), handling swap well has been an after thought for a decade.
Comment 8 Chris Murphy 2019-08-29 17:43:27 UTC
Created attachment 145212 [details]
dmesg conventional swap, 5.3.0rc6

This is perhaps superfluous Test with a conventional swap on plain partition on SSD, and the same thing happens. We can say it's not caused by swap on ZRAM.
Comment 9 Chris Murphy 2019-08-29 17:57:25 UTC
(In reply to Chris Wilson from comment #7)
> It's definitely the kernel's problem in mishandling resources, there are
> plenty still available, we just aren't getting the pages when they are
> required, as they are required. 

I see this very pronounced in the conventional swap on SSD case above, where top reports ~60% wa, and while free RAM is low, there's still quite a lot of swap left. But not a lot of activity compared to the swap on ZRAM case.

Active(file):      94364 kB
$ cat /proc/meminfo                                                                                                                             
MemTotal:        8025296 kB                                                                                                                                   
MemFree:          120132 kB                                                                                                                                   
MemAvailable:     119600 kB                                                                                                                                   
Buffers:              84 kB                                                                                                                                   
Cached:           232996 kB                                                                                                                                   
SwapCached:       601992 kB                                                                                                                                   
Active:          6403420 kB                                                                                                                                   
Inactive:         980736 kB                                                                                                                                   
Active(anon):    6309056 kB                                                                                                                                   
Inactive(anon):   914428 kB                                                                                                                                   
Active(file):      94364 kB                                                                                                                                   
Inactive(file):    66308 kB
Unevictable:       23220 kB
Mlocked:               0 kB
SwapTotal:       8214524 kB
SwapFree:        3756296 kB
Dirty:               840 kB
Writeback:             0 kB
AnonPages:       6899812 kB
Mapped:           128652 kB
Shmem:             72784 kB
KReclaimable:     116684 kB
Slab:             324752 kB
SReclaimable:     116684 kB
SUnreclaim:       208068 kB
KernelStack:       15296 kB
PageTables:        44364 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12227172 kB
Committed_AS:   15204776 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       40452 kB
VmallocChunk:          0 kB
Percpu:            20864 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      381944 kB
DirectMap2M:     7917568 kB
[chris@fmac ~]$ 


> Yup. Death-by-swap is an old problem (when the oomkiller doesn't kill you,
> you can die of old age waiting for a response wishing it had). Most of our
> effort is spent trying to minimise the system-wide impact when running at
> max memory (when the caches are regularly reaped), handling swap well has
> been an after thought for a decade.

I've tried quite a lot of variations, different sized swaps, swap on ZRAM, and zswap. And mostly it seems like rearranging deck chairs. I'm not getting enough quality data to have any idea which one is even marginally better, there's always some trade off. I guess I should focus instead on ways of containing unprivileged fork bombs - better they get mad in their own box than take down the whole system.
Comment 10 Chris Wilson 2019-10-08 20:48:49 UTC
*** Bug 111930 has been marked as a duplicate of this bug. ***
Comment 11 Martin Peres 2019-11-29 19:25:16 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/385.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.