Bug 93750

Summary: Xwayland crashes with SIGBUS when processing PutImage
Product: Wayland Reporter: Jonas Ådahl <jadahl>
Component: XWaylandAssignee: Wayland bug list <wayland-bugs>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Xwayland SIGBUS backtrace

Description Jonas Ådahl 2016-01-18 03:28:18 UTC
Created attachment 121102 [details]
Xwayland SIGBUS backtrace

Under certain circumstances Xwayland crashes with SIGBUS when processing PutImage damage.

I can reproduce this by opening a large page in Firefox and scrolling up and down by holding page up/down. It reproduces quite reliably.

See attachment for backtrace.
Comment 1 Michel Dänzer 2016-01-18 03:50:25 UTC
This might be a kernel DRM driver issue. Which driver are you using?

BTW, it looks like glamor isn't enabled in Xwayland. Does enabling it help?
Comment 2 Jonas Ådahl 2016-01-18 04:31:54 UTC
(In reply to Michel Dänzer from comment #1)
> This might be a kernel DRM driver issue. Which driver are you using?
> 
> BTW, it looks like glamor isn't enabled in Xwayland. Does enabling it help?

I'm using the Intel driver. I'm not using glamor, no. Last time I tried (quite a while ago) it wasn't working at all, but I can try again.

One thing I noticed was that /run/user/1000/ was 100% full when SIGBUS was hit, but I couldn't see any error messages about failed allocation, but not sure that has anything to do with it. I noticed this because I couldn't start another weston session (it complained about /run/user/1000 being full) before detaching GDB from Xwayland letting it free the resources it had allocated to that tmpfs.
Comment 3 Jonas Ådahl 2016-01-18 07:01:03 UTC
Seems like enabling glamor makes the issue go away.
Comment 4 Pekka Paalanen 2016-01-18 10:22:38 UTC
(In reply to Jonas Ådahl from comment #2)
> One thing I noticed was that /run/user/1000/ was 100% full when SIGBUS was
> hit, but I couldn't see any error messages about failed allocation, but not
> sure that has anything to do with it.

I think this is the cause. For an explanation why a full fs causes a SIGBUS, see:
http://cgit.freedesktop.org/wayland/wayland/commit/?id=011b6954031a25de8d9eb39631b6837553bb3cfb

Can you somehow check if the target memory is indeed mmapped from that fs and if so, how does the file get created? This would allow a graceful failure at least.

I think the next question would be whether the mmap is of the expected size, and should something throttle/reclaim somewhere to avoid filling the fs.
Comment 5 Jonas Ådahl 2016-01-18 10:42:01 UTC
(In reply to Pekka Paalanen from comment #4)
> (In reply to Jonas Ådahl from comment #2)
> > One thing I noticed was that /run/user/1000/ was 100% full when SIGBUS was
> > hit, but I couldn't see any error messages about failed allocation, but not
> > sure that has anything to do with it.
> 
> I think this is the cause. For an explanation why a full fs causes a SIGBUS,
> see:
> http://cgit.freedesktop.org/wayland/wayland/commit/
> ?id=011b6954031a25de8d9eb39631b6837553bb3cfb
> 
> Can you somehow check if the target memory is indeed mmapped from that fs
> and if so, how does the file get created? This would allow a graceful
> failure at least.

Seems my xwayland is building without posix_fallocate support, so it is most likely the "on-the-fly ENOSPC" SIGBUS then.

It also seems xwayland will never build with posix_fallocate because it is missing the configure.ac rule. Will send a patch fixing that, and see if we managed to fail gracefully.

Should ftruncate even be supported? To support it while being able to fail gracefully (i.e. not crash) we'd need to add sigbus handlers here and there I assume.

> 
> I think the next question would be whether the mmap is of the expected size,
> and should something throttle/reclaim somewhere to avoid filling the fs.

We already throttle wl_surface_attach right? So maybe we could just throttle buffer creation as well will work just fine.
Comment 6 Daniel Stone 2018-06-04 07:27:32 UTC
Marking as fixed with a proper autoconf test for posix_fallocate.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.