Created attachment 140221 [details] gdb and xorg output Using xf86-video-vmware 13.2.1 with mesa-18.0.0 (and earlier) and xorg-server-1.19.5 (and earlier) in vmware workstation 14 player works fine. Using both xf86-video-vmware 13.2.1 and 13.3.0 with mesa-18.0.0 and xorg-server-1.20.0 results in a segfault. Note that with the vesa or modesetting drivers and xf86-input-vmmouse it works fine. gdb output and Xorg.0.log attached
Created attachment 140222 [details] gdb output gdb output missing from above
Looks like https://patchwork.freedesktop.org/patch/226987/ should fix this.
Thanks - I should have mentioned that I tried that patch with xorg-server-1.20.0 and it didn't help. I also tried mesa-18.1.2 with no success either.
Thanks for reporting. A quick look at xf86-video-vmware seems like we're on screen close are calling into the crtc subsystem when it has already been taken down. I'll prepare a patch as soon as time allows.
Created attachment 140224 [details] [review] Proposed fix @john, Could you try this patch to xf86-video-vmware? Thanks, Thomas
Thanks for the rapid response. Unfortunately, it doesn't fix things - I tried with both the xf86-video-vmware patch alone and together with the xorg-server patch referenced above.
Indeed. Seems to be some kind of memory corruption in the closescreen path. I'll probably need some more time to debug this.
OK - let me know if you'd like me to do anything at this end.
So this is actually caused by the new leases code, accessing randr_outputs from CloseScreen where they have already been freed. I'll forward to KeithP to come up with a suitable solution. This happens also with the modesetting driver, although it typically doesn't generate a visible error, although valgrind catches it: ==8844== Invalid read of size 8 ==8844== at 0xCEB7D79: drmmode_terminate_leases (drmmode_display.c:3330) ==8844== by 0xCEAFB9C: CloseScreen (driver.c:1816) ==8844== by 0x53AAC3: RRCloseScreen (randr.c:108) ==8844== by 0x4D4BC8: xf86CrtcCloseScreen (xf86Crtc.c:743) ==8844== by 0x48DE01: DGACloseScreen (xf86DGA.c:288) ==8844== by 0x49C3CD: CMapCloseScreen (xf86cmap.c:250) ==8844== by 0x511FB7: XvCloseScreen (xvmain.c:309) ==8844== by 0x4AE21E: xf86XVCloseScreen (xf86xv.c:1168) ==8844== by 0x569F42: present_close_screen (present_screen.c:70) ==8844== by 0x4FD12A: CursorCloseScreen (cursor.c:205) ==8844== by 0x56728A: AnimCurCloseScreen (animcur.c:100) ==8844== by 0x4F7E91: compCloseScreen (compinit.c:86) ==8844== Address 0x15939540 is 352 bytes inside a block of size 368 free'd ==8844== at 0x4C2EDAC: free (vg_replace_malloc.c:530) ==8844== by 0x53AAA0: RRCloseScreen (randr.c:106) ==8844== by 0x4D4BC8: xf86CrtcCloseScreen (xf86Crtc.c:743) ==8844== by 0x48DE01: DGACloseScreen (xf86DGA.c:288) ==8844== by 0x49C3CD: CMapCloseScreen (xf86cmap.c:250) ==8844== by 0x511FB7: XvCloseScreen (xvmain.c:309) ==8844== by 0x4AE21E: xf86XVCloseScreen (xf86xv.c:1168) ==8844== by 0x569F42: present_close_screen (present_screen.c:70) ==8844== by 0x4FD12A: CursorCloseScreen (cursor.c:205) ==8844== by 0x56728A: AnimCurCloseScreen (animcur.c:100) ==8844== by 0x4F7E91: compCloseScreen (compinit.c:86) ==8844== by 0xA4DA544: glxCloseScreen (glxscreens.c:171) ==8844== Block was alloc'd at ==8844== at 0x4C2FB06: calloc (vg_replace_malloc.c:711) ==8844== by 0x53B31C: RRScreenInit (randr.c:308) ==8844== by 0x4E373C: xf86RandR12Init (xf86RandR12.c:870) ==8844== by 0x4D4CD4: xf86CrtcScreenInit (xf86Crtc.c:778) ==8844== by 0xCEAF69E: ScreenInit (driver.c:1660) ==8844== by 0x495A55: xf86ScreenInit (xf86Init.c:317) ==8844== by 0x43998B: AddScreen (dispatch.c:3915) ==8844== by 0x496D39: InitOutput (xf86Init.c:744) ==8844== by 0x43EDA2: dix_main (main.c:193) ==8844== by 0x42136D: main (stubmain.c:34) ==8844== ==8844== Invalid read of size 8 ==8844== at 0xCEB7D88: drmmode_terminate_leases (drmmode_display.c:3330) ==8844== by 0xCEAFB9C: CloseScreen (driver.c:1816) ==8844== by 0x53AAC3: RRCloseScreen (randr.c:108) ==8844== by 0x4D4BC8: xf86CrtcCloseScreen (xf86Crtc.c:743) ==8844== by 0x48DE01: DGACloseScreen (xf86DGA.c:288) ==8844== by 0x49C3CD: CMapCloseScreen (xf86cmap.c:250) ==8844== by 0x511FB7: XvCloseScreen (xvmain.c:309) ==8844== by 0x4AE21E: xf86XVCloseScreen (xf86xv.c:1168) ==8844== by 0x569F42: present_close_screen (present_screen.c:70) ==8844== by 0x4FD12A: CursorCloseScreen (cursor.c:205) ==8844== by 0x56728A: AnimCurCloseScreen (animcur.c:100) ==8844== by 0x4F7E91: compCloseScreen (compinit.c:86) ==8844== Address 0x15939540 is 352 bytes inside a block of size 368 free'd ==8844== at 0x4C2EDAC: free (vg_replace_malloc.c:530) ==8844== by 0x53AAA0: RRCloseScreen (randr.c:106) ==8844== by 0x4D4BC8: xf86CrtcCloseScreen (xf86Crtc.c:743) ==8844== by 0x48DE01: DGACloseScreen (xf86DGA.c:288) ==8844== by 0x49C3CD: CMapCloseScreen (xf86cmap.c:250) ==8844== by 0x511FB7: XvCloseScreen (xvmain.c:309) ==8844== by 0x4AE21E: xf86XVCloseScreen (xf86xv.c:1168) ==8844== by 0x569F42: present_close_screen (present_screen.c:70) ==8844== by 0x4FD12A: CursorCloseScreen (cursor.c:205) ==8844== by 0x56728A: AnimCurCloseScreen (animcur.c:100) ==8844== by 0x4F7E91: compCloseScreen (compinit.c:86) ==8844== by 0xA4DA544: glxCloseScreen (glxscreens.c:171) ==8844== Block was alloc'd at ==8844== at 0x4C2FB06: calloc (vg_replace_malloc.c:711) ==8844== by 0x53B31C: RRScreenInit (randr.c:308) ==8844== by 0x4E373C: xf86RandR12Init (xf86RandR12.c:870) ==8844== by 0x4D4CD4: xf86CrtcScreenInit (xf86Crtc.c:778) ==8844== by 0xCEAF69E: ScreenInit (driver.c:1660) ==8844== by 0x495A55: xf86ScreenInit (xf86Init.c:317) ==8844== by 0x43998B: AddScreen (dispatch.c:3915) ==8844== by 0x496D39: InitOutput (xf86Init.c:744) ==8844== by 0x43EDA2: dix_main (main.c:193) ==8844== by 0x42136D: main (stubmain.c:34) ==8844== In the xf86-video-vmware case, it looks different, and the server segfaults. ==9797== Invalid read of size 8 ==9797== at 0x5430AE: RROutputIsLeased (rrlease.c:123) ==9797== by 0x4D9F82: xf86DPMSSet (xf86Crtc.c:2967) ==9797== by 0xCEB3613: vmwgfx_disable_scanout (vmwgfx_crtc.c:117) ==9797== by 0xCEB1022: drv_leave_vt (vmwgfx_driver.c:1271) ==9797== by 0xCEB1288: drv_close_screen (vmwgfx_driver.c:1327) ==9797== by 0x511FB7: XvCloseScreen (xvmain.c:309) ==9797== by 0x4AE21E: xf86XVCloseScreen (xf86xv.c:1168) ==9797== by 0x5EAA60: SyncCloseScreen (misync.c:156) ==9797== by 0x4FD12A: CursorCloseScreen (cursor.c:205) ==9797== by 0x56728A: AnimCurCloseScreen (animcur.c:100) ==9797== by 0x4F7E91: compCloseScreen (compinit.c:86) ==9797== by 0x569F42: present_close_screen (present_screen.c:70) ==9797== Address 0x1377ba68 is 8 bytes inside a block of size 153 free'd ==9797== at 0x4C2EDAC: free (vg_replace_malloc.c:530) ==9797== by 0x547727: RROutputDestroyResource (rroutput.c:410) ==9797== by 0x46BE1F: doFreeResource (resource.c:880) ==9797== by 0x46C830: FreeClientResources (resource.c:1146) ==9797== by 0x46C91F: FreeAllResources (resource.c:1161) ==9797== by 0x43F170: dix_main (main.c:292) ==9797== by 0x42136D: main (stubmain.c:34) ==9797== Block was alloc'd at ==9797== at 0x4C2DBAB: malloc (vg_replace_malloc.c:299) ==9797== by 0x546967: RROutputCreate (rroutput.c:83) ==9797== by 0x4E5B53: xf86RandR12CreateObjects12 (xf86RandR12.c:1734) ==9797== by 0x4E747A: xf86RandR12Init12 (xf86RandR12.c:2375) ==9797== by 0x4E383E: xf86RandR12Init (xf86RandR12.c:895) ==9797== by 0x4D4CD4: xf86CrtcScreenInit (xf86Crtc.c:778) ==9797== by 0xCEB1D06: drv_screen_init (vmwgfx_driver.c:1200) ==9797== by 0x495A55: xf86ScreenInit (xf86Init.c:317) ==9797== by 0x43998B: AddScreen (dispatch.c:3915) ==9797== by 0x496D39: InitOutput (xf86Init.c:744) ==9797== by 0x43EDA2: dix_main (main.c:193) ==9797== by 0x42136D: main (stubmain.c:34)
Created attachment 140355 [details] [review] During reset/shutdown, clean up leases in DIX instead of each driver Thanks for the careful analysis. I've created a proposed fix which moves lease termination out of the driver and does it during destruction of the CRTC and Output resources instead. This patch has also been posted to the xorg-devel mailing list.
(In reply to Keith Packard from comment #10) > Created attachment 140355 [details] [review] [review] > During reset/shutdown, clean up leases in DIX instead of each driver > > Thanks for the careful analysis. I've created a proposed fix which moves > lease termination out of the driver and does it during destruction of the > CRTC and Output resources instead. This patch has also been posted to the > xorg-devel mailing list. But I don't think the patch catches the case where RROutputIsLeased() dereferences the RROutput pointer which has been freed. (Even if no leases ever existed). I think RROutputIsLeased() needs a screen-wide guard for whether resources have been freed or not....
The fix here is to remove all of the lease-related code from the video driver CloseScreen function. That's where you were seeing calls to RROutputIsLeased being made from. I can't see any other paths which reach RROutputIsLeased or RRCrtcIsLeased during server shutdown -- they're all in mode configuration routines. I have posted a patch which catches the other kind of errors -- calls made before the DIX resources are created during server startup. That's referenced in https://bugs.freedesktop.org/show_bug.cgi?id=106772 but only affects Nouveau in zaphod mode as far as I know. If we do find a path where these pointers are getting used, we can use the above patch along with an additional callback that clears the pointers when the resources are freed. I think that's effectively the same as your suggestion for a global check to see if the resources have been freed?
So the xf86-video-vmware driver hits CloseScreen() LeaveVT() xf86DPMSSet() RROutputIsLeased() -> Access that becomes invalid after resource takedown. Even if it's completely unaware of leases. Will your patch fix that? Thanks, Thomas
I can't tell if it will fix this one -- the randr_crtc and randr_output fields are cleared in xf86CrtcCloseScreen, so if that happens before the driver CloseScreen is called, then all will be well. Hrm. I think we should be clearing the randr_crtc and randr_output fields *before* we do anything else in xf86CrtcCloseScreen -- we know those pointers are invalid due to the resources being freed already. With that and the null checking wrappers around IsLeased, we should be in business. The whole management of the DIX vs DDX resources is a bit of a mess though; the DDX resources are more-or-less permanent while the DIX structures come and go with server generations. In any case, I've posted a two patch series which should fix this problem and cc'd you on it. Let me know what you think, and thanks for pointing out this additional issue.
I'd like to check if the two patch series fixes the problem - where can I find the patches?
The two patches in xorg-server git seem to fix the xf86-video-vmware problem, but there still seems to be a timing related issue with modesetting?
(In reply to Keith Packard from comment #14) > I can't tell if it will fix this one -- the randr_crtc and randr_output > fields are cleared in xf86CrtcCloseScreen, so if that happens before the > driver CloseScreen is called, then all will be well. Hrm. I think we should > be clearing the randr_crtc and randr_output fields *before* we do anything > else in xf86CrtcCloseScreen -- we know those pointers are invalid due to the > resources being freed already. With that and the null checking wrappers > around IsLeased, we should be in business. > > The whole management of the DIX vs DDX resources is a bit of a mess though; > the DDX resources are more-or-less permanent while the DIX structures come > and go with server generations. > > In any case, I've posted a two patch series which should fix this problem > and cc'd you on it. Let me know what you think, and thanks for pointing out > this additional issue. FWIW, the crash still occurs with xf86-video-vmware and xserver-1.20.1 which has https://gitlab.freedesktop.org/xorg/xserver/commit/101d15c
(In reply to Olivier Fourdan from comment #17) > FWIW, the crash still occurs with xf86-video-vmware and xserver-1.20.1 which > has https://gitlab.freedesktop.org/xorg/xserver/commit/101d15c (The equivalent of) drmmode_terminate_leases also needs to be removed from the driver.
(In reply to Michel Dänzer from comment #18) > (The equivalent of) drmmode_terminate_leases also needs to be removed from > the driver. I don't think the vmware driver uses that (can't find any reference to that). The crash occurs in `dixGetPrivate()` called from `RROutputIsLeased()` because output->pScreen is invalid. the vmware driver does: `drv_close_screen()` -> `LeaveVT()` → `drv_leave_vt()` → `vmwgfx_disable_scanout()` → `xf86DPMSSet()` → `xf86OutputIsLeased()` → `RROutputIsLeased()` The RROutput passed to `RROutputIsLeased()` from `xf86OutputIsLeased()` is the xf86OutputPtr's `output->randr_output` but I suspect this is already freed when we get to that point on teardown.
Actually, `drv_close_screen()` should be called after `xf86CrtcCloseScreen()` which does: ``` 763 /* The randr_output and randr_crtc pointers are already invalid as 764 * the DIX resources were freed when the associated resources were 765 * freed. Clear them now; referencing through them during the rest 766 * of the CloseScreen sequence will not end well. 767 */ 768 for (o = 0; o < config->num_output; o++) { 769 xf86OutputPtr output = config->output[o]; 770 771 output->randr_output = NULL; 772 } ``` So the output given to `xf86OutputIsLeased()` should be `NULL` and `RROutputIsLeased()` should be called (as `xf86OutputIsLeased()` check for a `NULL` output->randr_output)... The thing is vmware's `drv_screen_init()` does the `vmwgfx_wrap(ms, pScreen, CloseScreen, drv_close_screen);` before calling `xf86CrtcScreenInit(pScreen)` so I think we are missing attachment 140224 [details] [review] (to move it *after* `xf86CrtcScreenInit(pScreen)` which installs `xf86CrtcCloseScreen()`)
(In reply to Olivier Fourdan from comment #20) > Actually, `drv_close_screen()` should be called after > `xf86CrtcCloseScreen()` which does: > > ``` > 763 /* The randr_output and randr_crtc pointers are already invalid as > 764 * the DIX resources were freed when the associated resources were > 765 * freed. Clear them now; referencing through them during the rest > 766 * of the CloseScreen sequence will not end well. > 767 */ > 768 for (o = 0; o < config->num_output; o++) { > 769 xf86OutputPtr output = config->output[o]; > 770 > 771 output->randr_output = NULL; > 772 } > ``` > Actually, I don't think that will help since then LeaveVT, which turns off the screens will be called AFTER xf86CrtcCloseScreen() and more misery will occur. The root problem is still that the above code setting randr_output to NULL should be called immediately after the screen resources were taken down. Not in xf86CrtcCloseScreen. As an ugly workaround we could perhaps duplicate that code in the vmware driver just before calling LeaveVT.
Created attachment 142585 [details] [review] Patch that potentially fixes the problem I've attached a workaround to xf86-video-vmware that might fix the problem. Could anybody that is experiencing this try it out? Thanks, Thomas
(In reply to Thomas Hellström from comment #22) > Created attachment 142585 [details] [review] [review] > Patch that potentially fixes the problem > > I've attached a workaround to xf86-video-vmware that might fix the problem. > Could anybody that is experiencing this try it out? Changing xf86OutputPtr output's behind xf86 code (I mean in the driver) seems odd. I don't use vmware so I can't help with testing, but changing xf86OutputPtr output's behind xf86 code (I mean in the driver) seems odd.
(In reply to Olivier Fourdan from comment #23) > > Changing xf86OutputPtr output's behind xf86 code (I mean in the driver) > seems odd. > > I don't use vmware so I can't help with testing, but changing xf86OutputPtr > output's behind xf86 code (I mean in the driver) seems odd. I agree. that's why I called it an ugly workaround. But until Keith comes up with a better solution, I guess we have to live with this (provided it works). The problem is that when these resources are destroyed, there is no way to notify the xf86Crtc layer so that it can clean the pointers. A reasonable solution would be either to add such callbacks or refcount the objects so that they are not destroyed until xf86CrtcCloseScreen. Both non-trivial.
Hmm, FWIW, I can't reproduce this with Fedora Rawhide X Server 1.20.3-1.fc30, Perhaps Fedora carries some additional patch to fix this?
(In reply to Thomas Hellström from comment #25) > Hmm, > > FWIW, I can't reproduce this with Fedora Rawhide X Server 1.20.3-1.fc30, > Perhaps Fedora carries some additional patch to fix this? Nope, Fedora is pretty bog standard in this regard, most patches in the xserver are backports for Xwayland and for the vmwre driver, the only patch is for the 1.20 compat (fbGetRotatedPixmap)
Considering the memory is freed but not necessarily changed, I reckon the crash would be random.
Humm, I'm confused now... I added traces to the entry point of the relevant functions and I see, when closing: xf86CrtcCloseScreen() RRCloseScreen() drv_close_screen() vmwgfx_disable_scanout() So by the time we get to `vmwgfx_disable_scanout()`, the `xf86OutputPtr output->randr_output` should be all `NULL` already.... And valgrind doesn't seem to report anything wrong either.
Oh blimey! I found out, it's due to a *downstream* patch which calls LeaveVT() fromxf86CrtcCloseScreen() just *before* the output_randr_output are set to NULL! Completely sorry, all my apologies for the false alert!
Marking this bug as fixed now. Workaround is basically no calls into the xf86crtc module in the closeScreen Path.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.