Bug 110134 - SIGSEGV while playing large hevc video in mpv
Summary: SIGSEGV while playing large hevc video in mpv
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 18.3
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Kenneth Graunke
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-15 15:42 UTC by Anthony L. Eden
Modified: 2019-05-10 19:58 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
output of lshw (24.98 KB, text/plain)
2019-03-15 17:25 UTC, Anthony L. Eden
Details
full backtrace (6.92 KB, text/plain)
2019-05-09 22:22 UTC, LaserEyess
Details
intel_debug=buf log (4.68 KB, text/plain)
2019-05-09 22:24 UTC, LaserEyess
Details

Description Anthony L. Eden 2019-03-15 15:42:28 UTC
* thread #1, name = 'mpv', stop reason = signal SIGSEGV
  * frame #0: 0x00007fa379611cbf libc.so.6`__memmove_avx_unaligned_erms at memmove-vec-unaligned-erms.S:306
    frame #1: 0x00007fa362f82639 i965_dri.so`brw_upload_cs_work_groups_surface(brw=0x00007fa358538010) at brw_wm_surface_state.c:1660
    frame #2: 0x00007fa362f7a829 i965_dri.so`brw_upload_compute_state [inlined] check_and_emit_atom(atom=0x00007fa35854f3f0, state=<unavailable>, brw=0x00007fa358538010) at brw_state_upload.c:496
    frame #3: 0x00007fa362f7a810 i965_dri.so`brw_upload_compute_state at brw_state_upload.c:615
    frame #4: 0x00007fa362f7a6b8 i965_dri.so`brw_upload_compute_state(brw=0x00007fa358538010) at brw_state_upload.c:675
    frame #5: 0x00007fa362f613a8 i965_dri.so`brw_dispatch_compute_common(ctx=0x00007fa358538010) at brw_compute.c:192
    frame #6: 0x00007fa36319f7ff i965_dri.so`_mesa_DispatchCompute at compute.c:265
    frame #7: 0x00007fa36319f762 i965_dri.so`_mesa_DispatchCompute(num_groups_x=480, num_groups_y=270, num_groups_z=1) at compute.c:280
    frame #8: 0x0000556f661adf42 mpv`gl_renderpass_run(ra=0x00007fa358567700, params=0x00007fa368f984c0) at ra_gl.c:1051
    frame #9: 0x0000556f661936db mpv`gl_sc_dispatch_compute(sc=0x00007fa358677ca0, w=480, h=270, d=1) at shader_cache.c:1021
    frame #10: 0x0000556f6619a27c mpv`dispatch_compute(p=0x00007fa358678a30, w=3840, h=2160) at video.c:1165
    frame #11: 0x0000556f6619a379 mpv`finish_pass_tex(p=0x00007fa358678a30, dst_tex=0x00007fa358678e88, w=3840, h=2160) at video.c:1264
    frame #12: 0x0000556f6619cce7 mpv`pass_draw_to_screen(p=0x00007fa358678a30, fbo=<unavailable>) at video.c:2815
    frame #13: 0x0000556f6619fc64 mpv`gl_video_render_frame(p=0x00007fa358678a30, frame=0x00007fa1da397ce0, fbo=<unavailable>, flags=3) at video.c:3124
    frame #14: 0x0000556f661b3f5c mpv`draw_frame(vo=0x0000556f6726b090, frame=0x00007fa1da397ce0) at vo_gpu.c:87
    frame #15: 0x0000556f661b1b27 mpv`vo_render_frame_external(vo=0x0000556f6726b090) at vo.c:898
    frame #16: 0x0000556f661b25f7 mpv`vo_thread(ptr=0x0000556f6726b090) at vo.c:1055
    frame #17: 0x00007fa37d006a9d libpthread.so.0`start_thread(arg=<unavailable>) at pthread_create.c:486
    frame #18: 0x00007fa3795adaf3 libc.so.6`__GI___clone at clone.S:95
Comment 1 Anthony L. Eden 2019-03-15 15:46:14 UTC
To be more accurate, it appears to be a NULL pointer dereference:


(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x4
(gdb)
Comment 2 Lionel Landwerlin 2019-03-15 15:56:16 UTC
Is there a specific set of options that need to be given to mpv to reproduce this issue?
Comment 3 Denis 2019-03-15 16:25:59 UTC
also provide please your SW/HW configurations, mesa version, gpu, kernel etc
Comment 4 Anthony L. Eden 2019-03-15 17:22:24 UTC
(In reply to Lionel Landwerlin from comment #2)
> Is there a specific set of options that need to be given to mpv to reproduce
> this issue?

Nope
Comment 5 Anthony L. Eden 2019-03-15 17:24:35 UTC
(In reply to Denis from comment #3)
> also provide please your SW/HW configurations, mesa version, gpu, kernel etc

Linux distro is ArchLinux. Output of lshw attached. mesa 18.3.4-1. Linux kernel 5.0.0-arch1-1-ARCH.
Comment 6 Anthony L. Eden 2019-03-15 17:25:08 UTC
Created attachment 143683 [details]
output of lshw
Comment 7 Anthony L. Eden 2019-03-15 19:51:38 UTC
Happened again (doesn't occur frequently). Backtrace from coredump (gdb):

(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x4
(gdb) bt
#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:306
#1  0x00007f386c1f4639 in brw_upload_cs_work_groups_surface (brw=0x7f38644bcc40) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_wm_surface_state.c:1660
#2  0x00007f386c1ec829 in check_and_emit_atom (atom=0x7f38644d4020, state=<synthetic pointer>, brw=0x7f38644bcc40) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_state_upload.c:496
#3  brw_upload_pipeline_state (pipeline=BRW_COMPUTE_PIPELINE, brw=0x7f38644bcc40, brw@entry=0x7f38644d4068) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_state_upload.c:615
#4  brw_upload_compute_state (brw=brw@entry=0x7f38644bcc40) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_state_upload.c:675
#5  0x00007f386c1d33a8 in brw_dispatch_compute_common (ctx=0x7f38644bcc40) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_compute.c:192
#6  0x00007f386c4117ff in dispatch_compute (no_error=false, num_groups_z=1, num_groups_y=135, num_groups_x=240) at ../mesa-18.3.4/src/mesa/main/compute.c:265
#7  _mesa_DispatchCompute (num_groups_x=240, num_groups_y=135, num_groups_z=1) at ../mesa-18.3.4/src/mesa/main/compute.c:280
#8  0x000055a08dbdaf42 in gl_renderpass_run (ra=0x7f38644e2cd0, params=0x7f386dc724c0) at ../video/out/opengl/ra_gl.c:1051
#9  0x000055a08dbc06db in gl_sc_dispatch_compute (sc=0x7f38646780a0, w=w@entry=240, h=h@entry=135, d=d@entry=1) at ../video/out/gpu/shader_cache.c:1021
#10 0x000055a08dbc727c in dispatch_compute (p=p@entry=0x7f3864678e30, w=w@entry=1916, h=h@entry=1077, info=...) at ../video/out/gpu/video.c:1165
#11 0x000055a08dbc7379 in finish_pass_tex (p=p@entry=0x7f3864678e30, dst_tex=dst_tex@entry=0x7f3864679288, w=1916, h=1077) at ../video/out/gpu/video.c:1264
#12 0x000055a08dbc9ce7 in pass_draw_to_screen (p=p@entry=0x7f3864678e30, fbo=...) at ../video/out/gpu/video.c:2815
#13 0x000055a08dbccc64 in gl_video_render_frame (p=0x7f3864678e30, frame=frame@entry=0x7f3865434650, fbo=..., flags=flags@entry=3) at ../video/out/gpu/video.c:3124
#14 0x000055a08dbe0f5c in draw_frame (vo=0x55a08e1d0090, frame=0x7f3865434650) at ../video/out/vo_gpu.c:87
#15 0x000055a08dbdeb27 in vo_render_frame_external (vo=vo@entry=0x55a08e1d0090) at ../video/out/vo.c:898
#16 0x000055a08dbdf5f7 in vo_thread (ptr=0x55a08e1d0090) at ../video/out/vo.c:1055
#17 0x00007f3886113a9d in start_thread (arg=<optimized out>) at pthread_create.c:486
#18 0x00007f38826baaf3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Comment 8 Lionel Landwerlin 2019-03-17 18:09:37 UTC
All I can find is that somehow the buffer manager of i965 is failing to allocate or map some memory.
Which could indicate there is a memory leak somewhere...

Running mpv with INTEL_DEBUG=buf might print out some traces that would help.
Comment 9 Denis 2019-03-22 07:40:13 UTC
Hello Anthony, could you please clarify a video, at least as example?
I downloaded 4 of them (from available demo's, such a cartoon with bunny, etc... And ran them about 6 times in a row.

I didn't get sigfalults.

My configuration:
Kernel - 5.0
Distro - Manjaro (all packages up-to-date)
Mesa version - 18.3.4
Env - Gnome-shell
GPU - UHD630

What server in your case in use - X or wayland?
Comment 10 Anthony L. Eden 2019-03-23 17:05:21 UTC
The video which triggers the crash is available at the following link:

https://drive.google.com/file/d/1d3IcZwYunqWJbIph7gICM2spMKPrh-kz/view?usp=sharing

Just confirmed that the crash still occurs. It required playing the video for 44 minutes and 30 seconds (length of the video is 52:40). No extra command-line args were supplied to mpv.

My desktop environment is i3-wm (X server).
Comment 11 Denis 2019-03-25 11:06:41 UTC
ok, thanks for video and clarification. Will try it on my configuration, if nothing - who knows, will check i3 desktop env
Comment 12 Denis 2019-03-29 16:26:41 UTC
hi, sorry for delay, but looks like I was able to reproduce your issue. Reproducibility is very bad, 2 times only. So I continue my investigations.

Test configuration:
Manjaro
Kernel 5.0
Mesa 19.1.0 git-master
UHD 630 gpu (CFL)
Comment 13 LaserEyess 2019-05-09 22:22:44 UTC
Created attachment 144210 [details]
full backtrace
Comment 14 LaserEyess 2019-05-09 22:24:35 UTC
Created attachment 144211 [details]
intel_debug=buf log

tail -n100 of the log since the full thing is very big (100 MB)
Comment 15 Kenneth Graunke 2019-05-09 22:35:33 UTC
brw_upload_cs_work_groups_surface is leaking buffer objects like nobody's business.
Comment 16 Kenneth Graunke 2019-05-10 00:04:47 UTC
Patch out for review:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/857
Comment 17 LaserEyess 2019-05-10 01:51:19 UTC
(In reply to Kenneth Graunke from comment #16)
> Patch out for review:
> https://gitlab.freedesktop.org/mesa/mesa/merge_requests/857

I used to be able to reproduce this issue without fail after watching 20-30 minutes of HEVC video with HDR. Now, with this patch, I can't even trigger it after watching 2 hours of video. 

If it happens again I'll reply to this but for now this patch has fixed it for me.
Comment 18 Kenneth Graunke 2019-05-10 19:58:57 UTC
A slightly simplified version of that patch has landed in master:

commit 3f60810de0a2960ec15118ef9888d9efc9ea605a
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Thu May 9 15:40:13 2019 -0700

    i965: Fix memory leaks in brw_upload_cs_work_groups_surface().
    
    This was taking a reference to the 64kB upload buffer and never
    returning it, leaking a reference each time this atom triggered.
    
    This leaked lots of 64kB upload BOs, eventually running us out of
    of VMA space.  This would usually happen when using mpv to watch a
    movie, after 20-40 minutes.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110134
    Fixes: 63d7b33f516 i965/cs: Setup surface binding for gl_NumWorkGroups
    Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

It's been tagged for backporting to stable branches as well.  Thanks for the report!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.