Bug 101711 - [chv] Heavy web pages or high io load makes rendering stuck
Summary: [chv] Heavy web pages or high io load makes rendering stuck
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 17.1
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-06 22:06 UTC by russianneuromancer
Modified: 2019-07-31 09:13 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg of Linux 2017-07-06 drm-tip with drm.debug=0x1e log_buf_len=1M (1009.28 KB, text/plain)
2017-07-06 22:06 UTC, russianneuromancer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description russianneuromancer 2017-07-06 22:06:57 UTC
Created attachment 132484 [details]
dmesg of Linux 2017-07-06 drm-tip with drm.debug=0x1e log_buf_len=1M

Hello!

Rendering of heavy web pages like pages with many gif images (https://www.theverge.com/tech) maps, just heavy web-sites, frequently lead to state that looks like GPU hang - while tablet is accessible via ssh, rendering on display just stop for a few minutes (or sometimes until web browser is terminated) however /sys/class/drm/card0/error return "No error state collected". Killing web browser process could bring system to life faster, however this is possible only remotely which is not an option for tablet (hence critical severity).

First example:
Enable GPU rendering in Chrome/Opera/Chromium settings. (Also reproducible with disabled GPU rendering.)
Visit https://www.theverge.com/walt-mossberg-verge
Scroll down to "MOSSBERG: THE DISAPPEARING COMPUTER" animation, wait few seconds.

Second example:
Disable GPU rendering in Chrome/Opera/Chromium settings. (Also reproducible with enabled GPU rendering, but less frequently, so keep it disabled for this particular page.)
Visit https://www.theverge.com/tech
Slowly scroll this page up and down, few seconds later rendering will stuck.

Hardware:
Dell Vanue 8 Pro 5855 with x5-Z8500 CPU.

Software:
Ubuntu Gnome 17.04 x86_64
Linux 4.12.0rc7 and Intel drm-tip kernel (cod/tip/drm-tip/2017-07-06, cefc82aab8bb8ec201e922cf23d227c47845094c)
Mesa 17.1.2, libdrm 2.4.80
Issue is reproducible with both of modesetting DDX and intel DDX.
Display is rotated in System Settings.
I reproduced this issue with Opera web-browser (45, 46, 47).
Comment 1 russianneuromancer 2017-07-19 21:39:25 UTC
I find that this issue is reproducible only with HD/2GB RAM version of Dell 5855, and does not reproducible with FHD/4GB RAM version of Dell 5855.

I also was able to reproduce this issue with Linux 4.12.2.
Comment 2 russianneuromancer 2017-08-01 20:46:23 UTC
Issue is still reproducible on Linux 4.13rc3 with Mesa git (17.3~git170731230100.df61a05~z~padoka0) and Intel DDX git (2.99.917+git1708011513.2100efa~x~padoka0).
Comment 3 Chris Wilson 2017-08-23 08:44:53 UTC
If your working-set size does hit swap, then you are in for a terrible time. For the borderline case, https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=shrinker should hopefully help.

vmstat is a good guide to see if the system is stuck on [swap] io.

But ultimately if the working-set is larger than RAM, the only recourse is to reduce it.
Comment 4 russianneuromancer 2017-08-31 23:11:30 UTC
Hello!

Thank you for looking into this

> If your working-set size does hit swap, then you are in for a terrible time. For the borderline case, https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=shrinker should hopefully help.

In this round of testing (with "shrinker" kernel and upstream 4.13rc6) I noticed that now animation playback is struggle few times before it completely stuck (earlier it always happened on first playback). Unfortunately I didn't find noticeable difference between "shrinker" kernel and upstream 4.13rc6.

Later, while I tested 4.13rc7, I also find that if I switch to another tab in the beginning of playback, and give system enough time to push data from RAM to swap, then I can switch back, let browser playback animation, and does not get freeze anymore - does it give hint where root cause can be?
Comment 5 fawad 2018-05-05 18:52:19 UTC
Thanks, I am looking for this I appreciated 
https://fawadkhan.me
Comment 6 Hans de Goede 2019-07-31 08:16:13 UTC
I believe I hit this yesterday too, without a too large working set, but with lots of disk io instead.

I was only running ubuntu 19.04 + gnome-shell, no browser or anything open. "free" said used was about 700MB of 2G of RAM. Still when copying a large file I got a rendering hickup / stall in gnome-shell and this in dmesg:

[ 9594.079169] gnome-shell: page allocation failure: order:0, mode:0x40810(GFP_NOWAIT|__GFP_COMP|__GFP_RECLAIMABLE), nodemask=(null),cpuset=/,mems_allowed=0
[ 9594.079187] CPU: 3 PID: 1031 Comm: gnome-shell Tainted: G        WC        5.1.0-994-generic #201905172204
[ 9594.079189] Hardware name: IRBIS TW90/Default string, BIOS IRBIS12i.WT101P.JtBJRFA08 10/20/2017
[ 9594.079191] Call Trace:
[ 9594.079207]  dump_stack+0x63/0x8a
[ 9594.079213]  warn_alloc.cold.117+0x7b/0xfb
[ 9594.079219]  __alloc_pages_slowpath+0xe68/0xeb0
[ 9594.079225]  __alloc_pages_nodemask+0x2df/0x330
[ 9594.079231]  alloc_pages_current+0x81/0xe0
[ 9594.079235]  new_slab+0x337/0x740
[ 9594.079239]  ___slab_alloc+0x373/0x4f0
[ 9594.079353]  ? request_alloc_slow.isra.31+0x2f/0x5c [i915]
[ 9594.079427]  ? __i915_request_create+0x58/0x270 [i915]
[ 9594.079431]  ? __alloc_pages_nodemask+0x2df/0x330
[ 9594.079434]  __slab_alloc+0x20/0x40
[ 9594.079437]  kmem_cache_alloc+0x194/0x1c0
[ 9594.079510]  ? request_alloc_slow.isra.31+0x2f/0x5c [i915]
[ 9594.079584]  request_alloc_slow.isra.31+0x2f/0x5c [i915]
[ 9594.079655]  __i915_request_create+0x250/0x270 [i915]
[ 9594.079716]  __engine_park+0x91/0xc0 [i915]
[ 9594.079774]  __intel_wakeref_put_last+0x1c/0x50 [i915]
[ 9594.079834]  intel_engine_pm_put+0x40/0x50 [i915]
[ 9594.079894]  intel_context_exit_engine+0x19/0x20 [i915]
[ 9594.079958]  i915_request_retire+0x2ed/0x350 [i915]
[ 9594.080022]  ring_retire_requests+0x48/0x70 [i915]
[ 9594.080086]  i915_retire_requests+0x42/0x90 [i915]
[ 9594.080150]  i915_gem_shrink+0xb5/0x480 [i915]
[ 9594.080154]  ? __switch_to_asm+0x34/0x70
[ 9594.080218]  i915_gem_object_get_pages_gtt+0x1ab/0x6a0 [i915]
[ 9594.080283]  ? i915_vma_instance+0xd8/0x480 [i915]
[ 9594.080286]  ? prep_new_page+0x49/0x130
[ 9594.080349]  ____i915_gem_object_get_pages+0x22/0x40 [i915]
[ 9594.080412]  __i915_gem_object_get_pages+0x5b/0x70 [i915]
[ 9594.080477]  __i915_vma_do_pin+0x2d0/0x4e0 [i915]
[ 9594.080539]  eb_lookup_vmas+0x54b/0xba0 [i915]
[ 9594.080602]  i915_gem_do_execbuffer+0x466/0x1160 [i915]
[ 9594.080609]  ? xas_store+0x59/0x5c0
[ 9594.080617]  ? mem_cgroup_commit_charge+0x82/0x490
[ 9594.080620]  ? __kmalloc_node+0x1de/0x2b0
[ 9594.080697]  i915_gem_execbuffer2_ioctl+0x1c9/0x360 [i915]
[ 9594.080768]  ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0 [i915]
[ 9594.080820]  drm_ioctl_kernel+0xb0/0x100 [drm]
[ 9594.080849]  drm_ioctl+0x233/0x410 [drm]
[ 9594.080920]  ? i915_gem_execbuffer_ioctl+0x2b0/0x2b0 [i915]
[ 9594.080926]  do_vfs_ioctl+0xa9/0x640
[ 9594.080930]  ? handle_mm_fault+0xe1/0x210
[ 9594.080933]  ksys_ioctl+0x67/0x90
[ 9594.080936]  __x64_sys_ioctl+0x1a/0x20
[ 9594.080943]  do_syscall_64+0x5a/0x110
[ 9594.080947]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 9594.080951] RIP: 0033:0x7f69500a6417
[ 9594.080956] Code: 00 00 90 48 8b 05 79 0a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 0a 0d 00 f7 d8 64 89 01 48
[ 9594.080959] RSP: 002b:00007fff860cf8c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 9594.080962] RAX: ffffffffffffffda RBX: 000055fa530c4980 RCX: 00007f69500a6417
[ 9594.080964] RDX: 00007fff860cf910 RSI: 0000000040406469 RDI: 000000000000000b
[ 9594.080966] RBP: 00007fff860cf910 R08: 000055fa531049e0 R09: 00007f694311b010
[ 9594.080968] R10: 00007fff860cf900 R11: 0000000000000246 R12: 0000000040406469
[ 9594.080969] R13: 000000000000000b R14: ffffffffffffffff R15: 0000000000000000
[ 9594.080973] Mem-Info:
[ 9594.080983] active_anon:43723 inactive_anon:53610 isolated_anon:0
                active_file:44649 inactive_file:239254 isolated_file:0
                unevictable:10371 dirty:136 writeback:0 unstable:0
                slab_reclaimable:17485 slab_unreclaimable:25259
                mapped:28815 shmem:26900 pagetables:3257 bounce:0
                free:37391 free_pcp:378 free_cma:0
[ 9594.080989] Node 0 active_anon:174892kB inactive_anon:214440kB active_file:178596kB inactive_file:957016kB unevictable:41484kB isolated(anon):0kB isolated(file):0kB mapped:115260kB dirty:544kB writeback:0kB shmem:107600kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 9594.080992] Node 0 DMA free:7788kB min:372kB low:464kB high:556kB active_anon:96kB inactive_anon:436kB active_file:904kB inactive_file:4884kB unevictable:832kB writepending:0kB present:15984kB managed:15896kB mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 9594.081000] lowmem_reserve[]: 0 1857 1857 1857 1857
[ 9594.081004] Node 0 DMA32 free:141776kB min:142984kB low:154152kB high:165320kB active_anon:174796kB inactive_anon:213900kB active_file:177756kB inactive_file:952080kB unevictable:40560kB writepending:512kB present:2008988kB managed:1950576kB mlocked:48kB kernel_stack:7680kB pagetables:13028kB bounce:0kB free_pcp:1512kB local_pcp:428kB free_cma:0kB
[ 9594.081011] lowmem_reserve[]: 0 0 0 0 0
[ 9594.081014] Node 0 DMA: 85*4kB (UM) 47*8kB (UM) 22*16kB (UM) 12*32kB (UM) 9*64kB (UM) 7*128kB (U) 5*256kB (UME) 3*512kB (UME) 0*1024kB 1*2048kB (M) 0*4096kB = 7788kB
[ 9594.081030] Node 0 DMA32: 3068*4kB (UMH) 3382*8kB (UMEH) 1547*16kB (UMEH) 1206*32kB (UMH) 455*64kB (UM) 78*128kB (ME) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 141776kB
[ 9594.081048] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 9594.081050] 313991 total pagecache pages
[ 9594.081054] 3125 pages in swap cache
[ 9594.081056] Swap cache stats: add 132061, delete 128936, find 37320/43049
[ 9594.081057] Free swap  = 2673404kB
[ 9594.081058] Total swap = 2942972kB
[ 9594.081059] 506243 pages RAM
[ 9594.081061] 0 pages HighMem/MovableOnly
[ 9594.081062] 14625 pages reserved
[ 9594.081063] 0 pages cma reserved
[ 9594.081064] 0 pages hwpoisoned
[ 9594.081068] SLUB: Unable to allocate memory on node -1, gfp=0x800(GFP_NOWAIT)
[ 9594.081071]   cache: i915_request, object size: 584, buffer size: 640, default order: 2, min order: 0
[ 9594.081074]   node 0: slabs: 1, objs: 25, free: 0


Which indicates that even if the working set is ok, doing some diskio (this was copying a large file to a microsd, so not very high speed io) which fills the buffers can cause this too.

I guess in this case the problem was that the file source (eMMC) was probably faster then the destination leading to lots of dirty write buffers...
Comment 7 russianneuromancer 2019-07-31 08:21:50 UTC
Hans, you are using CherryTrail-based device too, right?
Comment 8 Hans de Goede 2019-07-31 09:13:27 UTC
(In reply to russianneuromancer from comment #7)
> Hans, you are using CherryTrail-based device too, right?

Right, I hit this on an Irbis TW90 tablet which uses a x5-Z8350 SoC.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.