Bug 110774 - Xorg segfault on cpu which does not support PAT
Summary: Xorg segfault on cpu which does not support PAT
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-27 14:17 UTC by Snir Sheriber
Modified: 2019-05-29 07:21 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
gdb bt (8.71 KB, text/plain)
2019-05-27 14:17 UTC, Snir Sheriber
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Snir Sheriber 2019-05-27 14:17:05 UTC
Created attachment 144351 [details]
gdb bt

My setup is as follow:

OS: fedora 30 5.0.11-300
cpu: QEMU Virtual CPU 2.5+ (without PAT - qemu's default)
gpu: assigned vGPU using GVTg 
mesa-dri-drivers: 19.0.3-1
xorg-x11-server-Xorg: 1.20.4-3

(This is a vm running on top of fedora 30 host, same kernel with Intel i7-6600u)

When i'm running inside xorg session and trying to open firefox, xorg crashes. bisecting points to this patch:
54c823ec790427acbea31212a6ed30a17bd25ff0 (mesa)

It seems that in case cpu does not support PAT, mapping may fail (brw_bo_map_wc fails without PAT and won't fallback since MAP_RAW is set) so that intel_miptree_map_raw will use *dst when it's NULL, passing it to linear_to_tiled which is trying to access *dst(=NULL).

Backtrace is attached.
Comment 1 Kenneth Graunke 2019-05-27 22:03:17 UTC
I guess we need to detect this case in brw_bufmgr.c when it sets has_mmap_wc.  Alternatively, the kernel (i915.ko) needs to not advertise WC map support (I915_PARAM_MMAP_VERSION >= 1) when the CPU doesn't support it.

That's pretty unfortunate, I had hoped to assume WC maps were possible.
Comment 2 Chris Wilson 2019-05-28 07:36:23 UTC
If you want the full detection method,

static bool test_has_wc_mmap(struct kgem *kgem)
{
        struct local_i915_gem_mmap2 wc;
        bool ret;

        if (DBG_NO_WC_MMAP)
                return false;

        /* XXX See https://bugs.freedesktop.org/show_bug.cgi?id=90841 */
        if (kgem->gen < 033)
                return false;

        if (gem_param(kgem, LOCAL_I915_PARAM_MMAP_VERSION) < 1)
                return false;

        VG_CLEAR(wc);
        wc.handle = gem_create(kgem->fd, 1);
        wc.offset = 0;
        wc.size = 4096;
        wc.flags = I915_MMAP_WC;
        ret = do_ioctl(kgem->fd, LOCAL_IOCTL_I915_GEM_MMAP_v2, &wc) == 0;
        gem_close(kgem->fd, wc.handle);

        return ret;
}

The parameter just said which version of the struct was supported; you were then meant to query which flags were supported by trial-and-error.

At this point, I would just refuse to run unless WC mmaps were available.
Comment 3 Snir Sheriber 2019-05-28 09:15:22 UTC
The kernel is already failing DRM_IOCTL_I915_GEM_MMAP with I915_MMAP_WC flag if PAT is not supported so that brw_bo_map_wc returns null.

I'm not familiar with this code nor the graphical stack but as i saw on other code parts usually when brw_bo_map is called the returned value is checked, in this case calling brw_bo_map assumes success, which results in segfault.

(BTW is it possible to somehow not to avoid the gtt_map in this case?)
Comment 4 Kenneth Graunke 2019-05-28 18:52:40 UTC
(In reply to Snir Sheriber from comment #3)
> The kernel is already failing DRM_IOCTL_I915_GEM_MMAP with I915_MMAP_WC flag
> if PAT is not supported so that brw_bo_map_wc returns null.
> 
> I'm not familiar with this code nor the graphical stack but as i saw on
> other code parts usually when brw_bo_map is called the returned value is
> checked, in this case calling brw_bo_map assumes success, which results in
> segfault.

No, in almost all cases we assume maps work.  In a few cases we do the bare minimum of punting the problem up to the caller, but in almost no case will it actually work out well.

> (BTW is it possible to somehow not to avoid the gtt_map in this case?)

In theory, we could blit everything via the 3D pipe every time we wanted to map a buffer, but that's expensive and impractical...in general, no, it's not really plausible to avoid mapping.

I could certainly detect this case and fail to load the driver with a message, which is probably preferable to crashing...but do you need this case to work?
Comment 5 Kenneth Graunke 2019-05-28 22:43:20 UTC
(In reply to Kenneth Graunke from comment #4)
> I could certainly detect this case and fail to load the driver with a
> message, which is probably preferable to crashing...but do you need this
> case to work?

Sorry, poorly worded.  The question is more: should we just require that PAT be enabled in QEMU order for virtualized graphics to work?  Or do we need to try and support the non-PAT case?  I don't know enough about the topic to know what is reasonable.
Comment 6 Snir Sheriber 2019-05-29 07:21:12 UTC
(In reply to Kenneth Graunke from comment #5)
> Sorry, poorly worded.  The question is more: should we just require that PAT
> be enabled in QEMU order for virtualized graphics to work?  Or do we need to
> try and support the non-PAT case?  I don't know enough about the topic to
> know what is reasonable.

I think it's not too much of a hassle to require PAT (by default or only when GVTg is used), we can try open an issue asking for that.

My use case not really requires this to work in the non-PAT case, I just think it's better to avoid the crashing if possible

BTW, it looks like it generally works but crashes when i open firefox, so maybe worth to consider a journal warning/err + still using gtt (which will present destroyed image IIUC)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.