Bug 106677

Summary: vmwgfx: atom (electron-based app) causes corruption, hangs
Product: Mesa Reporter: David Cuthbert <dacut>
Component: Drivers/Gallium/vmwgfxAssignee: mesa-dev
Status: RESOLVED WORKSFORME QA Contact: mesa-dev
Severity: major    
Priority: medium    
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description David Cuthbert 2018-05-28 07:18:41 UTC
I'm filing this currently so I have a place to keep notes on this bug.

Running the atom text editor under various OSes (tried Linux Mint 18.3, Ubuntu 18.04, and currently using Fedora 28) results in minor screen glitches, eventually followed by drawing going completely haywire. I recompiled vmwgfx.ko from the current HEAD which resulted in fewer glitches, but it never completely goes away.

The hangs are always immediately preceded by:
[drm:vmw_cmdbuf_work_func [vmwgfx]] *ERROR* Command "(null)" causing device error.
[drm:vmw_cmdbuf_work_func [vmwgfx]] *ERROR* Command buffer offset is 28
[drm:vmw_cmdbuf_work_func [vmwgfx]] *ERROR* Command size is 24

With the caveat that I don't know much about the internals here, both the offset and size seem reasonable here, so I don't think it's a stream-sync issue (client overrunning a buffer, etc.).

The interesting bit is this is resulting in the 'Command "(null)" causing device error' log instead of 'Unknown command causing device error' immediately above.

Peeling through vmw_cmd_describe:
        u32 cmd_id = ((u32 *) buf)[0];

        if (cmd_id >= SVGA_CMD_MAX) {
                SVGA3dCmdHeader *header = (SVGA3dCmdHeader *) buf;
                const struct vmw_cmd_entry *entry;

                *size = header->size + sizeof(SVGA3dCmdHeader);
                cmd_id = header->id;
                if (cmd_id >= SVGA_3D_CMD_MAX)
                        return false;

                cmd_id -= SVGA_3D_CMD_BASE;
                entry = &vmw_cmd_entries[cmd_id];
                *cmd = entry->cmd_name;
                return true;
        }

This appears to indicate the command being issued is falling between SVGA_CMD_MAX (47) and SVGA_3D_CMD_BASE (1040). I'll add some logic here to try to see what's actually being passed.

vmware.log on the host doesn't show anything interesting here.

Platform details:
uname -a: Linux fedora.seattle.kanga.org 4.16.11-300.fc28.x86_64 #1 SMP Tue May 22 18:29:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
VMware® Workstation 14 Pro 14.1.2 build-8497320
Host OS: Windows 10, 64-bit  (Build 15063) 10.0.15063

Guest is allocated 4 virtual cores, 8 GB (host has 8 hyperthreads/4 cores, 16 GB).

Graphics settings:
Accelerate 3D graphics enabled
Use host setting for monitors
768 MB (recommended) graphics memory allocated
Comment 1 David Cuthbert 2018-05-28 07:23:07 UTC
NB: I'm not sure if this belongs in the Mesa or DRI queue; neither has an exact match for the vmwgfx component. In any case, I'm referring to this tree: https://cgit.freedesktop.org/mesa/vmwgfx/
Comment 2 Deepak 2018-06-13 17:09:14 UTC
(In reply to David Cuthbert from comment #0)
> I'm filing this currently so I have a place to keep notes on this bug.
> 
> Running the atom text editor under various OSes (tried Linux Mint 18.3,
> Ubuntu 18.04, and currently using Fedora 28) results in minor screen
> glitches, eventually followed by drawing going completely haywire. I
> recompiled vmwgfx.ko from the current HEAD which resulted in fewer glitches,
> but it never completely goes away.
> 
> The hangs are always immediately preceded by:
> [drm:vmw_cmdbuf_work_func [vmwgfx]] *ERROR* Command "(null)" causing device
> error.
> [drm:vmw_cmdbuf_work_func [vmwgfx]] *ERROR* Command buffer offset is 28
> [drm:vmw_cmdbuf_work_func [vmwgfx]] *ERROR* Command size is 24
> 

Hi David, thanks for the bug report. Do you see the command buffer error with the new top of the tree vmwgfx only ? I tried to reproduce this bug with clean Ubuntu 18.04 and Atom installed from software center. I see that Atom text editor will be unresponsive but couldn't see the kernel command buffer errors.

Will try with Fedora 28 later.
Comment 3 Thomas Hellström 2018-06-13 17:50:03 UTC
FWIW, no apparent problems on Fedora Rawhide with 4.18.0-rc0.

/Thomas
Comment 4 David Cuthbert 2018-06-13 18:55:00 UTC
Note that it takes some fiddling to reproduce this currently (the exact trigger isn't known). I can go hours without seeing this issue.

I've been banging my head against the wall trying to get my extra logging to work -- finally realized yesterday that vmwgfx.ko is being loaded in initramfs and not from my filesystem. I'm attempting to reproduce it now with a rebuilt initramfs.
Comment 5 Deepak 2019-08-15 16:58:27 UTC
Unable to duplicate, at least with new top of tree mesa and vmware drivers.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.