Bug 101982 - Weston crashes when running an OpenGL program on i965
Summary: Weston crashes when running an OpenGL program on i965
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: EGL/Wayland (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Wayland bug list
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-31 13:13 UTC by Link Mauve
Modified: 2017-08-24 15:29 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Stack trace (10.15 KB, text/plain)
2017-07-31 13:56 UTC, Link Mauve
Details

Description Link Mauve 2017-07-31 13:13:01 UTC
This is a regression happening between Mesa 781263486f and f4d095cc65, I haven’t been able to bisect further due to a build issue failing to find spirv_info.h.

Even if Weston was started with the old revision, a client running with the newer Mesa will make it crash.

I haven’t debugged Weston itself yet, but it isn’t a regression on this side.
Comment 1 Daniel Stone 2017-07-31 13:20:31 UTC
That'd be linux-dmabuf then. Wonderful. Could you please get a backtrace out of Weston somehow? I'm running it on Intel myself (both upstream Weston as well as the atomic branch), but haven't seen any failures which weren't caused by my branch.
Comment 2 Link Mauve 2017-07-31 13:56:20 UTC
Created attachment 133146 [details]
Stack trace

Here it is.

I can still reproduce it on Weston master and Mesa master from ten minutes ago, on Linux 4.12.4, running i965 on gen7.
Comment 3 Daniel Stone 2017-07-31 14:16:00 UTC
I guess you're using glvnd ... ? If so: https://patchwork.freedesktop.org/series/28130/
Comment 4 Link Mauve 2017-07-31 14:31:20 UTC
Yes I am using glvnd, it’s the default in ArchLinux.

Your patch didn’t fix it, but now I have 2784 formats instead.
Comment 5 Emil Velikov 2017-07-31 14:33:41 UTC
There's a misplaced bracket - patch is coming in a second

--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1058,7 +1058,7 @@ intel_query_dma_buf_formats(__DRIscreen *screen, int max,
       return true;
    }
 
-   for (i = 0; i < (ARRAY_SIZE(intel_image_formats)) && j < max; i++) {
+   for (i = 0; (i < ARRAY_SIZE(intel_image_formats)) && j < max; i++) {
      if (intel_image_formats[i].fourcc == __DRI_IMAGE_FOURCC_SARGB8888)
        continue;
Comment 6 Emil Velikov 2017-07-31 14:41:07 UTC
*Scratch that... there should be no difference with/without my suggestion.
Comment 7 Daniel Stone 2017-07-31 14:46:26 UTC
(In reply to Link Mauve from comment #4)
> Yes I am using glvnd, it’s the default in ArchLinux.
> 
> Your patch didn’t fix it, but now I have 2784 formats instead.

That's certainly interesting. So at the '*formats = calloc(...)' line, the value of 'num' is 2784? I can't for the life of me see how that would happen.
Comment 8 Link Mauve 2017-08-03 23:30:44 UTC
When not using the opengl-hq profile of mpv, I get a much better error, I’m still not sure why it doesn’t crash the compositor in this case, and it does in the other one:

[…]
[3156842.470]  -> wl_shell_surface@10.set_title("Intellivision Lives! - mpv")        
VO: [opengl] 640x480 yuv420p                                                         
[3156847.776]  -> wl_shell_surface@10.set_toplevel()                                 
[3156848.866]  -> wl_surface@3.set_buffer_scale(1)                                   
[3156868.364]  -> zwp_linux_dmabuf_v1@18.create_params(new id zwp_linux_buffer_params_v1@16)                                                                              
[3156868.401]  -> zwp_linux_buffer_params_v1@16.add(fd 22, 0, 0, 2560, 16777216, 1)  
[3156868.443]  -> zwp_linux_buffer_params_v1@16.create_immed(new id wl_buffer@19, 640, 480, 875713112, 0)                                                                 
[3156868.473]  -> zwp_linux_buffer_params_v1@16.destroy()                            
[3156868.483]  -> wl_surface@3.attach(wl_buffer@19, 0, 0)                            
[3156868.502]  -> wl_surface@3.damage(0, 0, 2147483647, 2147483647)                  
[3156868.526]  -> wl_surface@3.commit()                                              
[3156868.534]  -> wl_display@1.sync(new id wl_callback@20)                           
[3156868.604] wl_display@1.error(nil, 7, "importing the supplied dmabufs failed")    
[destroyed object]: error 7: importing the supplied dmabufs failed                   
[vo/opengl/wayland] error occurred on the display fd: closing file descriptor        
[ffmpeg] NULL: Invalid NAL unit size (951 > 128).                                    
[ffmpeg] NULL: missing picture in access unit with size 132                          
[vo/opengl/wayland] error occurred on the display fd: closing file descriptor
[…]
with that last error repeating until mpv gives up.
Comment 9 Daniel Stone 2017-08-11 10:18:31 UTC
(In reply to Link Mauve from comment #8)
> When not using the opengl-hq profile of mpv, I get a much better error, I’m
> still not sure why it doesn’t crash the compositor in this case, and it does
> in the other one:
> 
> […]
> [3156842.470]  -> wl_shell_surface@10.set_title("Intellivision Lives! -
> mpv")        
> VO: [opengl] 640x480 yuv420p                                                
> 
> [3156847.776]  -> wl_shell_surface@10.set_toplevel()                        
> 
> [3156848.866]  -> wl_surface@3.set_buffer_scale(1)                          
> 
> [3156868.364]  -> zwp_linux_dmabuf_v1@18.create_params(new id
> zwp_linux_buffer_params_v1@16)                                              
> 
> [3156868.401]  -> zwp_linux_buffer_params_v1@16.add(fd 22, 0, 0, 2560,
> 16777216, 1)  
> [3156868.443]  -> zwp_linux_buffer_params_v1@16.create_immed(new id
> wl_buffer@19, 640, 480, 875713112, 0)                                       
> 
> [3156868.473]  -> zwp_linux_buffer_params_v1@16.destroy()                   
> 
> [3156868.483]  -> wl_surface@3.attach(wl_buffer@19, 0, 0)                   
> 
> [3156868.502]  -> wl_surface@3.damage(0, 0, 2147483647, 2147483647)         
> 
> [3156868.526]  -> wl_surface@3.commit()                                     
> 
> [3156868.534]  -> wl_display@1.sync(new id wl_callback@20)                  
> 
> [3156868.604] wl_display@1.error(nil, 7, "importing the supplied dmabufs
> failed")    
> [destroyed object]: error 7: importing the supplied dmabufs failed          
> 
> [vo/opengl/wayland] error occurred on the display fd: closing file
> descriptor        
> [ffmpeg] NULL: Invalid NAL unit size (951 > 128).                           
> 
> [ffmpeg] NULL: missing picture in access unit with size 132                 
> 
> [vo/opengl/wayland] error occurred on the display fd: closing file descriptor
> […]
> with that last error repeating until mpv gives up.

Does this still happen with current master? There were some Intel fixes for import which may be useful. If it does, could you please also get a trace from where inside eglCreateImage -> dri2_create_image_dma_buf -> intel_create_image_from_fds_common, that the failure actually occurs?
Comment 10 Link Mauve 2017-08-24 15:29:31 UTC
It doesn’t happen anymore on master, as of fe2f5cfdc7439cbe481d4bea393b46395967a8a3.

The main difference is that programs were using the I915_FORMAT_MOD_X_TILED modifier with create_immed, which was failing (and still is) on my HD4000 for some reason.  Now Mesa is using both I915_FORMAT_MOD_Y_TILED (which works fine here) and doesn’t crash the compositor anymore on an unsupported modifier.

As an aside, when I revert 85ef0215dd3fac2d2a141018467361cff92f4bab I still get a crash of the compositor, so this glvnd change was indeed needed (this was asked by Emil).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.