Created attachment 37054 [details] [review] hack to show the random freed values instead of crashing. radeon send vblank to the kernel with a client pointer in a structure vblwait.request.signal . The client dies , the X server detects it is not responding in its loop. Xserver calls client gone. Client and its pixmaps are freed. The kernel send back the vblank interrupt to xf86 video radeon . In its dri2 handle frame event radeon tries to process the drawable as if the client structure and pixmaps ones were still at their old location (saved into the vblwait passed to the kernel). I did not switched to drm-next kernel. Will try asap to see if this bug is not by me missing something as told by https://bugs.freedesktop.org/show_bug.cgi?id=28410#c28 . I made a hack that avoid the crash (attached) and prints the funky values that are at the old location. Fastest way to reproduce was to use gnome-screensaver-preferences and to switch opengl screensavers (it looks like each preview of screeensaver is a client and switching kills the previous one ).
Created attachment 37055 [details] xorg log of a crash (happens way later than the actual issue). This is where the xserver ends up crashing.
Stack details: kernel : 2.6.35-rc5+ xserver: 3209b094a3b1466b579e8020e12a4f3fa78a5f3f with entervt fix from bug 27114 . libdrm: b803918f3f77c62edf22e78cb2095be399753423 xf86-video-ati : 06691376b1ee963c711420edaf5a03eab6f5658f mesa : c4066b78c0aad41c199eb27157538c2ec9ab5bfd
It looks like I have the same problem (latest git versions of libdrm, mesa, xf86-video-ati, kernel drm-radeon-testing, xorg-server 1.8.2). The crash is often reproducible when I unlock my screen (I'm using GL screensaver). The core file's backtrace looks like: #0 radeon_dri2_copy_region (drawable=0xa25b4b8, region=0xbff5abe4, dest_buffer=0xa467308, src_buffer=0xa49ae38) at radeon_dri2.c:306 #1 0xb7332733 in radeon_dri2_frame_event_handler (frame=350152, tv_sec=1281125129, tv_usec=640400, event_data=0xa7a8e20) at radeon_dri2.c:405 #2 0xb7335f03 in drmmode_vblank_handler (fd=8, frame=350152, tv_sec=1281125129, tv_usec=640400, event_data=0xa7a8e20) at drmmode_display.c:1189 #3 0xb7364b02 in drmHandleEvent (fd=8, evctx=0x9412bf0) at xf86drmMode.c:781 #4 0xb7335ec9 in drm_wakeup_handler (data=0x9412bcc, err=2, p=0x8211d60) at drmmode_display.c:1199 #5 0x080752d1 in WakeupHandler (result=2, pReadmask=0x8211d60) at dixutils.c:403 #6 0x0809dfbe in WaitForSomething (pClientsReady=0x98d00d8) at WaitFor.c:232 #7 0x0806fe36 in Dispatch () at dispatch.c:375 #8 0x0806593d in main (argc=-1074416620, argv=0x8065540, envp=Cannot access memory at address 0x28 ) at main.c:286 Which corresponds perfectly to the described scenario.
I think this problem of freed ClientPtr could be solved by using some unique identifier (increasing number) - client_id (currently non-existent, client_index isn't enough). The event handler would then ask the server for the ClientPtr to continue. In this way the ClientPtr could be freed, the event handler would not receive the ClientPtr, so it could ignore the request.
*** Bug 29310 has been marked as a duplicate of this bug. ***
Created attachment 38119 [details] [review] a way to fix this vblank and clientgone issue This is not perfect. I still had issues with it ... like lattice and matrixvew staying assigned to init (running zombies) while gnome-screensaver-preferences was closed . Though I fixed as much as I could at this point . Please test and report issues or improve. In the end it should go into dri2 layer as discussed on the ML though I believe moving step by step (while keeping a wide view) is easier .
Note I did not used a unique identifier that increase ... because ti would have required it to increase indefinitely which is not easy to handle. If we made it loop we would have ended up with the same issue than with client->index ... it behing reused.
Just note that Christopher James Halse Rogers is also working on a solution on xorg ATI mailing list (xorg-driver-ati@lists.x.org). If you want, have a look at the xorg-driver-ati mailing list - 17.8.2010 and further, subject "[PATCH] dri2: Reference-count DRI2 buffers.".
(In reply to comment #8) > Just note that Christopher James Halse Rogers is also working on a solution on > xorg ATI mailing list (xorg-driver-ati@lists.x.org). > > If you want, have a look at the xorg-driver-ati mailing list - 17.8.2010 and > further, subject "[PATCH] dri2: Reference-count DRI2 buffers.". the reference count one or a newer one ? In next patch I implemented the option2 from this thread and I would like it to go option3. I talked with him on irc and he explained me this option 2. I really like the option 3 both of you talked about though I do not know if James is working on it so I will start it nevertheless.
Created attachment 38139 [details] [review] the option 2 version of the vblank when client gone fix.
(In reply to comment #8) > Just note that Christopher James Halse Rogers is also working on a solution on > xorg ATI mailing list (xorg-driver-ati@lists.x.org). > > If you want, have a look at the xorg-driver-ati mailing list - 17.8.2010 and > further, subject "[PATCH] dri2: Reference-count DRI2 buffers.". The dri2: Reference-count DRI2 buffers I tried yesterday and it leads to random crashes of X in radeon drv after DRI2 layer . Ie it is in xorg-edgers and there I reproduced it . Replacing this patch in the xorg-edgers by the option 2 patch fixes all crashes. I tried to report to launchpad though it looks like bugs report are not doable without an ubuntu-bug tool, at least for that project.
(In reply to comment #10) > Created an attachment (id=38139) [details] > the option 2 version of the vblank when client gone fix. The patch (option 2) looks good and works for me; I like the used idea ;-) I have one question - I'm not sure if the method radeon_dri2_screen_init is the right place to call AddCallback, because it should be called once per server instance. dixRegisterPrivateKey can be called more times, it doesn't reserve the space twice. But I have to admit that I don't know how many times the method radeon_dri2_screen_init could be called (on the multi-head setup for example) - how many screens could you have on one xorg-server instance...
Created attachment 38209 [details] [review] this is option 3 , ie it relies on another patch for dri2 in xserver The issue you mentioned in option 2 I agree should be dealt with . Though I completed this partial option 3 which is my favorite (except not as polished as option 2). Which has the advantage that it provides a mechanism for all drivers (and without the event normalization does not remove any flexibility in the driver event structure. I will attach the xserver dri2 patch to this bug report (seems I cannot add more than one patch in a row).
Created attachment 38210 [details] [review] xserver part of the option 3 patch
Created attachment 38211 [details] [review] this is option 3 , ie it relies on another patch for dri2 in xserver (v2) obviously when cleaning up a cherry pick from a branch to another one has to take care not to delete more than intended. This version of the patch is complete.
I'm leaning towards option 2 - option 3 seems to incur basically all of the versioning pain of moving things into the server while not providing all of the benefits of handling it in the server.
Created attachment 38484 [details] [review] the option 2, updated fix: vblank when client gone event version (v2)
(In reply to comment #17) > Created an attachment (id=38484) [details] > the option 2, updated fix: vblank when client gone event version (v2) Please make the commit log more self-explanatory and submit to the mailing list for review, ideally with git send-email but at least generated by git format-patch.
Created attachment 38703 [details] [review] the option 2: updated fix sent to xorg-driver-ati mailing list (version 3)
*** Bug 30206 has been marked as a duplicate of this bug. ***
Created attachment 38943 [details] [review] the option 2: updated fix sent to xorg-driver-ati mailing list (version 4) This is the latest patch (V4) rebased for the current git HEAD. The V4 patch was sent to mailing list on 19.9.2010 too and covers review comments.
Created attachment 39128 [details] [review] the option 2: updated fix sent to xorg-driver-ati mailing list (version 5) v5: Distribute list.h as xorg_list.h, remove xorg-server version check. Use the version from xorg-server when available (checked in configure.ac).
Created attachment 39258 [details] [review] the option 2: updated fix sent to xorg-driver-ati mailing list (version 6) v6: Removed xorg_list.h, made DRI2 scheduling features dependent on list.h presence.
Thanks. Pushed: 81360adffb2a66b9a95a38671f9227a9718c9841 I guess we can call this bug closed now?
(In reply to comment #24) > I guess we can call this bug closed now? Yes, I think so. I'm marking it as RESOLVED/FIXED :-)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.