Created attachment 89185 [details] DMESG output with extended drm logging turned on So with the recent release of 2.99.906, I've run into a serious issue. Namely XOrg is crashing - alot. Downgrading to 2.99.904 resolves the issue though I wasn't having this problem a few weeks ago with an interim build. The easiest way for me to replicate this is: [1] Log into GDM [2] Wait for Cinnamon to start [3] Click on the Thunderbird icon in my docky dock Around 50% of the time, this will cause XOrg to die and gdm will restart it. If this doesn't cause XOrg to die, goofing around for a few minutes in other apps will eventually cause it. In any event when Thunderbird dies this way it's always in the process of "zooming the window in" from the center of the screen. I've attached the relevant XOrg log (though it's pretty useless). I've also attached my dmesg dump as I had turned on extended drm logging in preparation for replicating and opening a bug on the issue this morning.
Created attachment 89186 [details] XOrg log file that was created when the crash occurred
Yeah, no indication of failure there. Try attaching gdb (a second computer is best).
Unfortunately I'm not in a position to do that :( I guess I'll have to hope somebody else can replicate the bug. If I could narrow down to a specific commit, would that help?
That would be a reasonable approach. Also look if anything is captured in other logs like /var/log/gdm/:0.log*
There is nothing worth seeing in the GDM logs. I've narrowed it down to a specific commit. The issue starts with the following commit: [f3225fcb38686f3b9701725bf3a11ecf1c100c3f] sna: Be move conservative with tiling sizes for older fenced gen: Crash http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=f3225fcb38686f3b9701725bf3a11ecf1c100c3f I'm testing the commit made directly prior to make absolutely sure it does not have the issue. So far so good.
Yeah so far the previous commit seems rock solid. So I'll be sticking with that one for the time being.
False positive. That code is not even executed for your system and cannot cause the behaviour you suggest.
If it is that frequent, please compile with --enable-debug=full and attach the debug log leading up to the crash.
It's definitely something in that commit. I have no doubts about it. The commit immediately prior has been rock solid. In the bugged versions getting X to crash required virtually no effort. If opening up Thunderbird after initially logging in didn't do it, opening a few tabs in the browser, scrolling around a bit and then opening thunderbird again would do it. I will recompile the bugged version with debugging and provide the results.
I am unable to replicate the issue when the driver is compiled in debug mode. X runs quite sluggish but absolutely refuses to crash. As soon as I switch to the same driver not compiled in debug mode I can replicate the issue.
Just try with --enable-debug then and look for any assertion failure in /var/log/gdm/ (or friends).
I'm curious - in regards to: if (kgem->gen < 033) Would that evaluate to true or false for my hardware? I'm assuming based on your statements here it would evaluate to false. What would the value of kgem->gen be for my hardware?
I think based on what I'm seeing in intel_module.c, the gen for my haswell hardware is going to be 075. If that's true it stands to reason that a patch like the following would change the behavior of the driver for my hardware: - size *= 2; + if (kgem->gen < 033) + size *= 2; Instead of size always being doubled, it would now only be doubled for earlier generations of the hardware.
(In reply to comment #13) > I think based on what I'm seeing in intel_module.c, the gen for my haswell > hardware is going to be 075. > > If that's true it stands to reason that a patch like the following would > change the behavior of the driver for my hardware: > > - size *= 2; > + if (kgem->gen < 033) > + size *= 2; > > Instead of size always being doubled, it would now only be doubled for > earlier generations of the hardware. Apart from that entire block is only evaluate if gen < 040.
Created attachment 89197 [details] XOrg log of crash with debugging enabled (not full debugging)
Created attachment 89198 [details] GDM log of crash with debugging enabled (not full debugging)
It would help immensely to have debug symbols, addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x239f1 0x26205 0xb5971 0xb5949
I will provide that momentarily ;) Note: I'm switching between drivers so that my laptop will remain at least somewhat useable during this process. This code in the patch looked a bit odd: - size = 3*kgem->aperture_fenced; - if (kgem->aperture_total == kgem->aperture_mappable) - size += kgem->aperture; - if (size > kgem->aperture_mappable && - kgem_ring_is_idle(kgem, kgem->ring)) - return false; + if (kgem->aperture_fenced) { + size = 3*kgem->aperture_fenced; + if (kgem->aperture_total == kgem->aperture_mappable) + size += kgem->aperture; + if (size > kgem->aperture_mappable && + kgem_ring_is_idle(kgem, kgem->ring)) { + DBG(("%s: opportunistic fence flush\n", __FUNCTION__)); + return false; + } + } size = kgem->aperture_fenced; It almost feels like that last line ought to have been deleted. Prior to this revision if execution got here and kgem->aperture_fenced == 0 then I'm guessing "return false" would've been executed. In this revision this situation would be allowed to continue.
It's a flush at a low watermark. If the batch has no other fences, then we need to be careful that we only evaluate this command (and its fenced bo) against the high watermark. Otherwise we reject operations that could actually be handled by an early flush and starting with an empty batch.
The command you gave me doesn't seem to produce any useful output: addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x239f1 0x26205 0xb5971 0xb5949 ??:0 ??:0 ??:0 ??:0
Your installation stripped all of the debug symbols from the drivers. Either correct the install process to preserve the symbols (avoid stripping), or point addr2line to the original intel_drv.so in the build path.
Yes it did - running it against the binary in the build directory did the trick. I added the -f option as well: addr2line -e ./intel_drv.so -i -f 0x239f1 0x26205 0xb5971 0xb5949 kgem_bo_set_purgeable kgem.c:? __kgem_bo_destroy kgem.c:? sna_tiling_blt_copy_boxes :? sna_tiling_blt_copy_boxes :?
You win. commit 6e9a8c5ae2883ca21d117ac672dd8a55b3429dc1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 14 16:14:09 2013 +0000 sna: Add the missing braces around the conditional block Fixes regression from commit f3225fcb38686f3b9701725bf3a11ecf1c100c3f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 5 08:38:22 2013 +0000 sna: Be move conservative with tiling sizes for older fenced gen Reported-by: Jay Little <jaylittle@jaylittle.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71605 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
My initial testing results are very promising. I'll continue to use the driver over the next few hours and see how it goes. Also, thanks a lot for your help here. It's a real pleasure working with you Chris!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.