Summary: | SIGSEGV while playing large hevc video in mpv | ||
---|---|---|---|
Product: | Mesa | Reporter: | Anthony L. Eden <anthony.louis.eden> |
Component: | Drivers/DRI/i965 | Assignee: | Kenneth Graunke <kenneth> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | major | ||
Priority: | medium | ||
Version: | 18.3 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
output of lshw
full backtrace intel_debug=buf log |
Description
Anthony L. Eden
2019-03-15 15:42:28 UTC
To be more accurate, it appears to be a NULL pointer dereference: (gdb) p $_siginfo._sifields._sigfault.si_addr $1 = (void *) 0x4 (gdb) Is there a specific set of options that need to be given to mpv to reproduce this issue? also provide please your SW/HW configurations, mesa version, gpu, kernel etc (In reply to Lionel Landwerlin from comment #2) > Is there a specific set of options that need to be given to mpv to reproduce > this issue? Nope (In reply to Denis from comment #3) > also provide please your SW/HW configurations, mesa version, gpu, kernel etc Linux distro is ArchLinux. Output of lshw attached. mesa 18.3.4-1. Linux kernel 5.0.0-arch1-1-ARCH. Created attachment 143683 [details]
output of lshw
Happened again (doesn't occur frequently). Backtrace from coredump (gdb): (gdb) p $_siginfo._sifields._sigfault.si_addr $1 = (void *) 0x4 (gdb) bt #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:306 #1 0x00007f386c1f4639 in brw_upload_cs_work_groups_surface (brw=0x7f38644bcc40) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_wm_surface_state.c:1660 #2 0x00007f386c1ec829 in check_and_emit_atom (atom=0x7f38644d4020, state=<synthetic pointer>, brw=0x7f38644bcc40) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_state_upload.c:496 #3 brw_upload_pipeline_state (pipeline=BRW_COMPUTE_PIPELINE, brw=0x7f38644bcc40, brw@entry=0x7f38644d4068) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_state_upload.c:615 #4 brw_upload_compute_state (brw=brw@entry=0x7f38644bcc40) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_state_upload.c:675 #5 0x00007f386c1d33a8 in brw_dispatch_compute_common (ctx=0x7f38644bcc40) at ../mesa-18.3.4/src/mesa/drivers/dri/i965/brw_compute.c:192 #6 0x00007f386c4117ff in dispatch_compute (no_error=false, num_groups_z=1, num_groups_y=135, num_groups_x=240) at ../mesa-18.3.4/src/mesa/main/compute.c:265 #7 _mesa_DispatchCompute (num_groups_x=240, num_groups_y=135, num_groups_z=1) at ../mesa-18.3.4/src/mesa/main/compute.c:280 #8 0x000055a08dbdaf42 in gl_renderpass_run (ra=0x7f38644e2cd0, params=0x7f386dc724c0) at ../video/out/opengl/ra_gl.c:1051 #9 0x000055a08dbc06db in gl_sc_dispatch_compute (sc=0x7f38646780a0, w=w@entry=240, h=h@entry=135, d=d@entry=1) at ../video/out/gpu/shader_cache.c:1021 #10 0x000055a08dbc727c in dispatch_compute (p=p@entry=0x7f3864678e30, w=w@entry=1916, h=h@entry=1077, info=...) at ../video/out/gpu/video.c:1165 #11 0x000055a08dbc7379 in finish_pass_tex (p=p@entry=0x7f3864678e30, dst_tex=dst_tex@entry=0x7f3864679288, w=1916, h=1077) at ../video/out/gpu/video.c:1264 #12 0x000055a08dbc9ce7 in pass_draw_to_screen (p=p@entry=0x7f3864678e30, fbo=...) at ../video/out/gpu/video.c:2815 #13 0x000055a08dbccc64 in gl_video_render_frame (p=0x7f3864678e30, frame=frame@entry=0x7f3865434650, fbo=..., flags=flags@entry=3) at ../video/out/gpu/video.c:3124 #14 0x000055a08dbe0f5c in draw_frame (vo=0x55a08e1d0090, frame=0x7f3865434650) at ../video/out/vo_gpu.c:87 #15 0x000055a08dbdeb27 in vo_render_frame_external (vo=vo@entry=0x55a08e1d0090) at ../video/out/vo.c:898 #16 0x000055a08dbdf5f7 in vo_thread (ptr=0x55a08e1d0090) at ../video/out/vo.c:1055 #17 0x00007f3886113a9d in start_thread (arg=<optimized out>) at pthread_create.c:486 #18 0x00007f38826baaf3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 All I can find is that somehow the buffer manager of i965 is failing to allocate or map some memory. Which could indicate there is a memory leak somewhere... Running mpv with INTEL_DEBUG=buf might print out some traces that would help. Hello Anthony, could you please clarify a video, at least as example? I downloaded 4 of them (from available demo's, such a cartoon with bunny, etc... And ran them about 6 times in a row. I didn't get sigfalults. My configuration: Kernel - 5.0 Distro - Manjaro (all packages up-to-date) Mesa version - 18.3.4 Env - Gnome-shell GPU - UHD630 What server in your case in use - X or wayland? The video which triggers the crash is available at the following link: https://drive.google.com/file/d/1d3IcZwYunqWJbIph7gICM2spMKPrh-kz/view?usp=sharing Just confirmed that the crash still occurs. It required playing the video for 44 minutes and 30 seconds (length of the video is 52:40). No extra command-line args were supplied to mpv. My desktop environment is i3-wm (X server). ok, thanks for video and clarification. Will try it on my configuration, if nothing - who knows, will check i3 desktop env hi, sorry for delay, but looks like I was able to reproduce your issue. Reproducibility is very bad, 2 times only. So I continue my investigations. Test configuration: Manjaro Kernel 5.0 Mesa 19.1.0 git-master UHD 630 gpu (CFL) Created attachment 144210 [details]
full backtrace
Created attachment 144211 [details]
intel_debug=buf log
tail -n100 of the log since the full thing is very big (100 MB)
brw_upload_cs_work_groups_surface is leaking buffer objects like nobody's business. Patch out for review: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/857 (In reply to Kenneth Graunke from comment #16) > Patch out for review: > https://gitlab.freedesktop.org/mesa/mesa/merge_requests/857 I used to be able to reproduce this issue without fail after watching 20-30 minutes of HEVC video with HDR. Now, with this patch, I can't even trigger it after watching 2 hours of video. If it happens again I'll reply to this but for now this patch has fixed it for me. A slightly simplified version of that patch has landed in master: commit 3f60810de0a2960ec15118ef9888d9efc9ea605a Author: Kenneth Graunke <kenneth@whitecape.org> Date: Thu May 9 15:40:13 2019 -0700 i965: Fix memory leaks in brw_upload_cs_work_groups_surface(). This was taking a reference to the 64kB upload buffer and never returning it, leaking a reference each time this atom triggered. This leaked lots of 64kB upload BOs, eventually running us out of of VMA space. This would usually happen when using mpv to watch a movie, after 20-40 minutes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110134 Fixes: 63d7b33f516 i965/cs: Setup surface binding for gl_NumWorkGroups Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> It's been tagged for backporting to stable branches as well. Thanks for the report! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.