Created attachment 16768 [details] Backtrace of frozen X server On mesa/xserver/drm git, starting compiz locks up the xserver in drmWaitVBlank. Can bring up a new xserver without reboot after remotely killing frozen server. Backtrace attached.
What's the chipset model (915/965?) Please attach Xorg.0.log.
It's a 965. Log will be coming soon.
Created attachment 16777 [details] Xorg startup log
Does it come up if you tell gdb to 'finish' ioctl() and then to 'return ret' from drmWaitVBlank()? If so, this is probably a DRM vblank-rework quirk. I'm working on some fixes for those.
Yep, that looks like the issue. Comes out fine after that.
Moreover, good news. The driver runs compiz for a while when the workaround in Comment #9 of Bug #14441 is applied to xf86-video-intel. Unfortunately, it did crash after a few minutes without leaving a stack trace. I'll look into it tomorrow. (In reply to comment #5) > Yep, that looks like the issue. Comes out fine after that. >
Ben, does this startup hang still occur if you use the latest DRM modules that include Michel's vblank fixes?
Created attachment 16972 [details] GDB log of drm with vblank fixes
Yep, looks like the same bug occurs with 6905c7a29d2a3bc0e605a09b98ac02a4a50893d0. I just attached another gdb backtrace for good measure. Didn't look any different but I might have missed something.
xf86-video-intel may need to call the DRM_IOCTL_MODESET_CTL ioctl appropriately.
Ben, how about using the latest mesa/xserver/drm git?
Nope, just tried latest git mesa (cf29ab3ba075905cca), xserver (3c5a8200111a), and drm (00f549bd5f40) and no dice. Same symptoms so I would assume it's the same issue although I didn't have a chance to throw it on a remote debugger. I can do that later today to verify if necessary. (In reply to comment #11) > Ben, how about using the latest mesa/xserver/drm git? >
Created attachment 17261 [details] [review] don't wait for non-existent page flips Ben, can you give this Mesa patch a try? It turns out the 965 Mesa driver was waiting for a bogus vblank event, because there's some 915 state we try to use that we shouldn't.
Seems to work. I think that did it. (In reply to comment #13) > Created an attachment (id=17261) [details] > don't wait for non-existent page flips > > Ben, can you give this Mesa patch a try? It turns out the 965 Mesa driver was > waiting for a bogus vblank event, because there's some 915 state we try to use > that we shouldn't. >
Ok, at least that works around the issue for you. Still working with Michel on the correct fix though...
(In reply to comment #15) > Ok, at least that works around the issue for you. Still working with Michel on > the correct fix though... > This is still not working for me, without the above patch any GL programs lock up with sync to vblank turned on. With the patch I get no lock up but at when tested with glxgears shows a still image which flickers to black about once per second. If I resize the window you can see old buffer images until the gears resizes to fill the window and again you get a still image or a black window with an occasional flicker of the gears depending on the size of the window. It's like the buffer isn't getting swapped.
(In reply to comment #16) > (In reply to comment #15) > > Ok, at least that works around the issue for you. Still working with Michel on > > the correct fix though... > > > > This is still not working for me, without the above patch any GL programs lock > up with sync to vblank turned on. With the patch I get no lock up but at when > tested with glxgears shows a still image which flickers to black about once per > second. If I resize the window you can see old buffer images until the gears > resizes to fill the window and again you get a still image or a black window > with an occasional flicker of the gears depending on the size of the window. > It's like the buffer isn't getting swapped. > I'm getting a lot of these in my syslog: [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0 Could it be waiting for an interrupt on the wrong pipe?
Could be, when I dug deeper I found quite a few other problems. I'm going to try to get the fixes pushed this week. In the meantime, you can check out the patches: i915-965-pageflip-support.patch, i830-page-flip-vblank.patch mesa-965-pageflip.patch, from this thread on dri-devel: http://sourceforge.net/mailarchive/message.php?msg_name=200806231647.14827.jbarnes%40virtuousgeek.org (it's in the message from the 23rd).
I'm not sure any of those patches directly address this problem. The wait for vblank in question is related to scheduled buffer swaps, so I think either the code for supporting those needs to be implemented/completed/fixed for 965, or this wait (along with any existing related code) needs to be disabled for 965. (When I wrote this code for the i915 driver, it was still separate from i965)
Forgot to mention I removed myself from CC: as I read the xorg-team list.
Right, that's what those patches do. The one attached to this bug effectively removes the wait for swap on 965 (though it could be done better), the others (on the mailing list) actually implement it for 965, which would be preferred.
'It' being page flipping? My point is that this bug isn't directly related to that but to scheduled buffer swaps. Unless I'm missing something, none of the patches discussed here address those.
Arg, now you're going to make me look at all this again... :) IIRC when I did the patchset to dri-devel I found that all of this code was interrelated; this particular bug is the result of LOCK_HARDWARE waiting on a 0 vblank count, since vbl_pending was never updated. The render buffer vbl_pending fields that LOCK_HARDWARE depends on get updated in intelWindowMoved, but only if page flipping is active (pf_num_pages is nonzero), so it's an important field for page flipping. It's also updated in intelScheduleSwap, but it looks like only if the DRM_I915_VBLANK_SWAP ioctl fails, and of course assuming we don't return from intelScheduleSwap early, but there's a problem there, since intelScheduleSwap wants to do LOCK_HARDWARE. To fix plain buffer swapping, maybe we just need to make intelWindowMoved set the vbl_pending fields in the non page flipping case too. So even though my patch set fixes the buffer swap case also, it leaves page flipping & buffer swap conflated, which is probably a bad thing... But this is your code, I could very well be missing some other aspect of this behavior; it's pretty subtle and doesn't seem to be commented anywhere.
(In reply to comment #23) > The render buffer vbl_pending fields that LOCK_HARDWARE depends on get updated > in intelWindowMoved, but only if page flipping is active (pf_num_pages is > nonzero), so it's an important field for page flipping. Why do you say 'only if page flipping is active'? The only conditions I see are if (!intel->intelScreen->driScrnPriv->dri2.enabled && intel->intelScreen->driScrnPriv->ddx_version.minor >= 7) { and if (flags != dPriv->vblFlags && dPriv->vblFlags && !(dPriv->vblFlags & VBLANK_FLAG_NO_IRQ)) { One problem might be that the latter test isn't satisfied initially, so the field doesn't get initialized? > It's also updated in intelScheduleSwap, but it looks like only if the > DRM_I915_VBLANK_SWAP ioctl fails, The other way around - it's only necessary to wait for a swap when one has actually been scheduled. > and of course assuming we don't return from intelScheduleSwap early, but > there's a problem there, since intelScheduleSwap wants to do LOCK_HARDWARE. Why is that a problem? Anyway, it only returns early if it's not possible/necessary to schedule a swap. > So even though my patch set fixes the buffer swap case also, it leaves page > flipping & buffer swap conflated, which is probably a bad thing... Yes, it merely avoids the problem because page flipping happens to be disabled. But the two concepts aren't directly related (other than flips can be scheduled as well). > But this is your code, I could very well be missing some other aspect of this > behavior; it's pretty subtle and doesn't seem to be commented anywhere. Yeah, sorry about that. Hope this clears up some things, or I'm happy to answer more questions.
On Wednesday, July 9, 2008 12:29:12 am bugzilla-daemon@freedesktop.org wrote: > > The render buffer vbl_pending fields that LOCK_HARDWARE depends on get > > updated in intelWindowMoved, but only if page flipping is active > > (pf_num_pages is nonzero), so it's an important field for page flipping. > > Why do you say 'only if page flipping is active'? The only conditions I see > are > > if (!intel->intelScreen->driScrnPriv->dri2.enabled && > intel->intelScreen->driScrnPriv->ddx_version.minor >= 7) { > > and > > if (flags != dPriv->vblFlags && dPriv->vblFlags && > !(dPriv->vblFlags & VBLANK_FLAG_NO_IRQ)) { Yeah, that's the general code, but the specific update for vbl_pending is: for (i = 0; i < intel_fb->pf_num_pages; i++) { if (intel_fb->color_rb[i]) intel_fb->color_rb[i]->vbl_pending = intel_fb->vbl_waited; } > One problem might be that the latter test isn't satisfied initially, so the > field doesn't get initialized? I think it's just that no one ever tested/implemented this stuff on 965, so while some of the shared stuff has been fixed up, most of it was really i915 code (or something, I haven't gone through the history in any detail). > > It's also updated in intelScheduleSwap, but it looks like only if the > > DRM_I915_VBLANK_SWAP ioctl fails, > > The other way around - it's only necessary to wait for a swap when one has > actually been scheduled. Oh duh, right. > > and of course assuming we don't return from intelScheduleSwap early, but > > there's a problem there, since intelScheduleSwap wants to do > > LOCK_HARDWARE. > > Why is that a problem? Anyway, it only returns early if it's not > possible/necessary to schedule a swap. I'll test again on my 965GM, iirc I wasn't even getting to the buffer swap schedule ioctl in some cases... > > So even though my patch set fixes the buffer swap case also, it leaves > > page flipping & buffer swap conflated, which is probably a bad thing... > > Yes, it merely avoids the problem because page flipping happens to be > disabled. But the two concepts aren't directly related (other than flips > can be scheduled as well). > > > But this is your code, I could very well be missing some other aspect of > > this behavior; it's pretty subtle and doesn't seem to be commented > > anywhere. > > Yeah, sorry about that. Hope this clears up some things, or I'm happy to > answer more questions. Yeah, thanks. As I fix things on 965 I'll try to comment the various methods (or at least the fields involved), since eventually I'd like to just use scheduled swaps or page flips by default... Thanks, Jesse
(In reply to comment #25) > [...] the specific update for vbl_pending is: > > for (i = 0; i < intel_fb->pf_num_pages; i++) { > if (intel_fb->color_rb[i]) > intel_fb->color_rb[i]->vbl_pending = intel_fb->vbl_waited; > } Ah. The intention is for pf_num_pages to be 2 (or 3 with triple buffering enabled) regardless of whether page flipping is enabled. See intelUpdatePageFlipping(). > > > It's also updated in intelScheduleSwap, but it looks like only if the > > > DRM_I915_VBLANK_SWAP ioctl fails, > > > > The other way around - it's only necessary to wait for a swap when one has > > actually been scheduled. > > Oh duh, right. That said, it might make sense to set vbl_pending = vbl_waited if intelScheduleSwap() returns GL_FALSE, to prevent them from diverging too much, eventually resulting in wait for vblank ioctl timeouts. > > > and of course assuming we don't return from intelScheduleSwap early, but > > > there's a problem there, since intelScheduleSwap wants to do > > > LOCK_HARDWARE. > > > > Why is that a problem? Anyway, it only returns early if it's not > > possible/necessary to schedule a swap. > > I'll test again on my 965GM, iirc I wasn't even getting to the buffer swap > schedule ioctl in some cases... Looking at intelScheduleSwap(), that should only happen if sync to vblank is effectively disabled.
(In reply to comment #26) > (In reply to comment #25) > > [...] the specific update for vbl_pending is: > > > > for (i = 0; i < intel_fb->pf_num_pages; i++) { > > if (intel_fb->color_rb[i]) > > intel_fb->color_rb[i]->vbl_pending = intel_fb->vbl_waited; > > } > > Ah. The intention is for pf_num_pages to be 2 (or 3 with triple buffering > enabled) regardless of whether page flipping is enabled. See > intelUpdatePageFlipping(). Yeah, I think that's the real root cause here. Somehow when this code was unified the actual initialization of pf_num_pages got lost, which means we never update vbl_pending which means we hang in LOCK_HARDWARE. > > > > It's also updated in intelScheduleSwap, but it looks like only if the > > > > DRM_I915_VBLANK_SWAP ioctl fails, > > > > > > The other way around - it's only necessary to wait for a swap when one has > > > actually been scheduled. > > > > Oh duh, right. > > That said, it might make sense to set vbl_pending = vbl_waited if > intelScheduleSwap() returns GL_FALSE, to prevent them from diverging too much, > eventually resulting in wait for vblank ioctl timeouts. Please see the attached patch. Along with the recent fixes to the DRM, it makes my 965 happy again with vblank_mode=2 or 3. If you think it makes sense I could also rename pf_num_pages to num_render_buffers or something and set it outside of intelUpdatePageFlipping() to split the buffer swap & page flip code a bit better... Thanks, Jesse
Created attachment 17810 [details] [review] Enable page flip & buffer swap code on 965
(In reply to comment #27) > > If you think it makes sense I could also rename pf_num_pages to > num_render_buffers or something and set it outside of intelUpdatePageFlipping() > to split the buffer swap & page flip code a bit better... I don't think it matters what I think there. :) The patch looks good, except for one thing: > driWaitForVBlank(dPriv, &missed_target); > + dPriv->vblSeq = missed_target; missed_target is a boolean which indicates whether the target sequence was missed. driWaitForVBlank() already updated dPriv->vblSeq to the current sequence number.
> --- Comment #29 from Michel Dänzer <michel@tungstengraphics.com> > 2008-07-21 23:58:14 PST --- (In reply to comment #27) > > > If you think it makes sense I could also rename pf_num_pages to > > num_render_buffers or something and set it outside of > > intelUpdatePageFlipping() to split the buffer swap & page flip code a bit > > better... > > I don't think it matters what I think there. :) Hey, I value your opinion! :) > The patch looks good, except for one thing: > > driWaitForVBlank(dPriv, &missed_target); > > + dPriv->vblSeq = missed_target; > > missed_target is a boolean which indicates whether the target sequence was > missed. driWaitForVBlank() already updated dPriv->vblSeq to the current > sequence number. Ah ok, I'll drop that bit then. Thanks.
Fixed in 97988ccc46c0248177cd71937021ca8cc2a7452b.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.