Bug 29065

Summary: X segfault due to radeon trying to process in reqblank on a client that is freed
Product: xorg Reporter: Alban Browaeys <prahal>
Component: Driver/RadeonAssignee: xf86-video-ati maintainers <xorg-driver-ati>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: chalserogers, oldium.pro, smf.linux
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
hack to show the random freed values instead of crashing.
none
xorg log of a crash (happens way later than the actual issue).
none
a way to fix this vblank and clientgone issue
none
the option 2 version of the vblank when client gone fix.
none
this is option 3 , ie it relies on another patch for dri2 in xserver
none
xserver part of the option 3 patch
none
this is option 3 , ie it relies on another patch for dri2 in xserver (v2)
none
the option 2, updated fix: vblank when client gone event version (v2)
none
the option 2: updated fix sent to xorg-driver-ati mailing list (version 3)
none
the option 2: updated fix sent to xorg-driver-ati mailing list (version 4)
none
the option 2: updated fix sent to xorg-driver-ati mailing list (version 5)
none
the option 2: updated fix sent to xorg-driver-ati mailing list (version 6) none

Description Alban Browaeys 2010-07-14 13:55:49 UTC
Created attachment 37054 [details] [review]
hack to show the random freed values instead of crashing.

radeon send vblank to the kernel with a client pointer in a structure vblwait.request.signal . The client dies , the X server detects it is not responding in its loop. Xserver calls client gone. Client and its pixmaps are freed. The kernel send back the vblank interrupt to xf86 video radeon . In its dri2 handle frame event radeon tries to process the drawable as if the client structure and pixmaps ones were still at their old location (saved into the vblwait passed to the kernel).

I did not switched to drm-next kernel. Will try asap to see if this bug is not by me missing something as told by https://bugs.freedesktop.org/show_bug.cgi?id=28410#c28 .

I made a hack that avoid the crash (attached) and prints the funky values that are at the old location.
Fastest way to reproduce was to use gnome-screensaver-preferences and to switch opengl screensavers (it looks like each preview of screeensaver is a client and switching kills the previous one ).
Comment 1 Alban Browaeys 2010-07-14 13:57:32 UTC
Created attachment 37055 [details]
xorg log of a crash (happens way later than the actual issue).

This is where the xserver ends up crashing.
Comment 2 Alban Browaeys 2010-07-14 14:11:08 UTC
Stack details: 
kernel : 2.6.35-rc5+
xserver: 3209b094a3b1466b579e8020e12a4f3fa78a5f3f with entervt fix from bug 27114 .
libdrm: b803918f3f77c62edf22e78cb2095be399753423
xf86-video-ati : 06691376b1ee963c711420edaf5a03eab6f5658f
mesa : c4066b78c0aad41c199eb27157538c2ec9ab5bfd
Comment 3 Oldrich Jedlicka 2010-08-06 13:37:33 UTC
It looks like I have the same problem (latest git versions of libdrm, mesa, xf86-video-ati, kernel drm-radeon-testing, xorg-server 1.8.2). The crash is often reproducible when I unlock my screen (I'm using GL screensaver). The core file's backtrace looks like:

#0  radeon_dri2_copy_region (drawable=0xa25b4b8, region=0xbff5abe4, dest_buffer=0xa467308, src_buffer=0xa49ae38) at radeon_dri2.c:306
#1  0xb7332733 in radeon_dri2_frame_event_handler (frame=350152, tv_sec=1281125129, tv_usec=640400, event_data=0xa7a8e20) at radeon_dri2.c:405
#2  0xb7335f03 in drmmode_vblank_handler (fd=8, frame=350152, tv_sec=1281125129, tv_usec=640400, event_data=0xa7a8e20)
    at drmmode_display.c:1189
#3  0xb7364b02 in drmHandleEvent (fd=8, evctx=0x9412bf0) at xf86drmMode.c:781
#4  0xb7335ec9 in drm_wakeup_handler (data=0x9412bcc, err=2, p=0x8211d60) at drmmode_display.c:1199
#5  0x080752d1 in WakeupHandler (result=2, pReadmask=0x8211d60) at dixutils.c:403
#6  0x0809dfbe in WaitForSomething (pClientsReady=0x98d00d8) at WaitFor.c:232
#7  0x0806fe36 in Dispatch () at dispatch.c:375
#8  0x0806593d in main (argc=-1074416620, argv=0x8065540, envp=Cannot access memory at address 0x28
) at main.c:286

Which corresponds perfectly to the described scenario.
Comment 4 Oldrich Jedlicka 2010-08-08 13:23:10 UTC
I think this problem of freed ClientPtr could be solved by using some unique identifier (increasing number) - client_id (currently non-existent, client_index isn't enough). The event handler would then ask the server for the ClientPtr to continue. In this way the ClientPtr could be freed, the event handler would not receive the ClientPtr, so it could ignore the request.
Comment 5 Christopher James Halse Rogers 2010-08-10 00:41:06 UTC
*** Bug 29310 has been marked as a duplicate of this bug. ***
Comment 6 Alban Browaeys 2010-08-24 00:25:03 UTC
Created attachment 38119 [details] [review]
a way to fix this vblank and clientgone issue

This is not perfect. I still had issues with it ... like lattice and matrixvew staying assigned to init (running zombies) while gnome-screensaver-preferences was closed . Though I fixed as much as I could at this point . Please test and report issues or improve.
In the end it should go into dri2 layer as discussed on the ML though I believe moving step by step (while keeping a wide view) is easier .
Comment 7 Alban Browaeys 2010-08-24 00:27:03 UTC
Note I did not used a unique identifier that increase ... because ti would have required it to increase indefinitely which is not easy to handle. If we made it loop we would have ended up with the same issue than with client->index ... it behing reused.
Comment 8 Oldrich Jedlicka 2010-08-24 13:46:17 UTC
Just note that Christopher James Halse Rogers is also working on a solution on xorg ATI mailing list (xorg-driver-ati@lists.x.org).

If you want, have a look at the xorg-driver-ati mailing list - 17.8.2010 and further, subject "[PATCH] dri2: Reference-count DRI2 buffers.".
Comment 9 Alban Browaeys 2010-08-25 06:03:12 UTC
(In reply to comment #8)
> Just note that Christopher James Halse Rogers is also working on a solution on
> xorg ATI mailing list (xorg-driver-ati@lists.x.org).
> 
> If you want, have a look at the xorg-driver-ati mailing list - 17.8.2010 and
> further, subject "[PATCH] dri2: Reference-count DRI2 buffers.".

the reference count one or a newer one ? In next patch I implemented the option2 from this thread and I would like it to go option3. I talked with him on irc and he explained me this option 2. I really like the option 3 both of you talked about though I do not know if James is working on it so I will start it nevertheless.
Comment 10 Alban Browaeys 2010-08-25 06:04:44 UTC
Created attachment 38139 [details] [review]
the option 2 version of the vblank when client gone fix.
Comment 11 Alban Browaeys 2010-08-26 05:46:04 UTC
(In reply to comment #8)
> Just note that Christopher James Halse Rogers is also working on a solution on
> xorg ATI mailing list (xorg-driver-ati@lists.x.org).
> 
> If you want, have a look at the xorg-driver-ati mailing list - 17.8.2010 and
> further, subject "[PATCH] dri2: Reference-count DRI2 buffers.".

The  dri2: Reference-count DRI2 buffers I tried yesterday and it leads to random crashes of X in radeon drv after DRI2 layer . Ie it is in xorg-edgers and there I reproduced it . Replacing this patch in the xorg-edgers by the option 2 patch fixes all crashes.
I tried to report to launchpad though it looks like bugs report are not doable without an ubuntu-bug tool, at least for that project.
Comment 12 Oldrich Jedlicka 2010-08-26 11:08:56 UTC
(In reply to comment #10)
> Created an attachment (id=38139) [details]
> the option 2 version of the vblank when client gone fix.

The patch (option 2) looks good and works for me; I like the used idea ;-)

I have one question - I'm not sure if the method radeon_dri2_screen_init is the right place to call AddCallback, because it should be called once per server instance. dixRegisterPrivateKey can be called more times, it doesn't reserve the space twice. But I have to admit that I don't know how many times the method radeon_dri2_screen_init could be called (on the multi-head setup for example) - how many screens could you have on one xorg-server instance...
Comment 13 Alban Browaeys 2010-08-26 18:15:26 UTC
Created attachment 38209 [details] [review]
this is option 3 , ie it relies on another patch for dri2 in xserver

The issue you mentioned in option 2 I agree should be dealt with . Though I completed this partial option 3 which is my favorite (except not as polished as option 2). Which has the advantage that it provides a mechanism for all drivers (and without the event normalization does not remove any flexibility in the driver event structure.

I will attach the xserver dri2 patch to this bug report (seems I cannot add more than one patch in a row).
Comment 14 Alban Browaeys 2010-08-26 18:16:24 UTC
Created attachment 38210 [details] [review]
xserver part of the option 3 patch
Comment 15 Alban Browaeys 2010-08-26 19:19:40 UTC
Created attachment 38211 [details] [review]
this is option 3 , ie it relies on another patch for dri2 in xserver (v2)

obviously when cleaning up a cherry pick from a branch to another one has to take care not to delete more than intended. This version of the patch is complete.
Comment 16 Michel Dänzer 2010-08-27 02:31:29 UTC
I'm leaning towards option 2 - option 3 seems to incur basically all of the versioning pain of moving things into the server while not providing all of the benefits of handling it in the server.
Comment 17 Oldrich Jedlicka 2010-09-06 13:25:13 UTC
Created attachment 38484 [details] [review]
the option 2, updated fix: vblank when client gone event version (v2)
Comment 18 Michel Dänzer 2010-09-07 00:31:41 UTC
(In reply to comment #17)
> Created an attachment (id=38484) [details]
> the option 2, updated fix: vblank when client gone event version (v2)

Please make the commit log more self-explanatory and submit to the mailing list for review, ideally with git send-email but at least generated by git format-patch.
Comment 19 Oldrich Jedlicka 2010-09-14 13:57:52 UTC
Created attachment 38703 [details] [review]
the option 2: updated fix sent to xorg-driver-ati mailing list (version 3)
Comment 20 Michel Dänzer 2010-09-15 02:08:25 UTC
*** Bug 30206 has been marked as a duplicate of this bug. ***
Comment 21 Oldrich Jedlicka 2010-09-24 20:05:03 UTC
Created attachment 38943 [details] [review]
the option 2: updated fix sent to xorg-driver-ati mailing list (version 4)

This is the latest patch (V4) rebased for the current git HEAD. The V4 patch was sent to mailing list on 19.9.2010 too and covers review comments.
Comment 22 Oldrich Jedlicka 2010-10-03 01:27:31 UTC
Created attachment 39128 [details] [review]
the option 2: updated fix sent to xorg-driver-ati mailing list (version 5)

v5: Distribute list.h as xorg_list.h, remove xorg-server version check. Use the version from xorg-server when available (checked in configure.ac).
Comment 23 Oldrich Jedlicka 2010-10-07 09:17:25 UTC
Created attachment 39258 [details] [review]
the option 2: updated fix sent to xorg-driver-ati mailing list (version 6)

v6: Removed xorg_list.h, made DRI2 scheduling features dependent on list.h presence.
Comment 24 Alex Deucher 2010-10-07 09:46:56 UTC
Thanks.  Pushed:
81360adffb2a66b9a95a38671f9227a9718c9841
I guess we can call this bug closed now?
Comment 25 Oldrich Jedlicka 2010-10-07 13:00:31 UTC
(In reply to comment #24)
> I guess we can call this bug closed now?

Yes, I think so. I'm marking it as RESOLVED/FIXED :-)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.