Description
Tyler Foo
2014-04-05 02:50:44 UTC
Created attachment 96929 [details]
journalctl output of aegisub
Output from "journaltcl | grep -i 'aegisub'"
This is only happens when I set "AccelMethod" to "sna". I at least need the Xorg.0.log containing the stacktrace. Preferrably it will have its symbols resolved. As you can reproduce this, please try and attach gdb. Created attachment 96934 [details]
Xorg.0.log.old file when the crash happened
Created attachment 96935 [details]
Xorg.0.log file after the crash
(In reply to comment #3) > I at least need the Xorg.0.log containing the stacktrace. Preferrably it > will have its symbols resolved. As you can reproduce this, please try and > attach gdb. Sorry, this seems quite complicated to me, but when I have time, I'll certainly try it. In the mean time, pls check the Xorg.0.log.old file. Thanks. Ok, since the backtrace contains no symbols, can you try addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x5ee02 0x4ba12 It doesn't tend to be as accurate as gdb, but it may help in the interim. Note that this also requires the debug packages for Xorg and xf86-video-intel to be installed. Simply loading aegisub doesn't crash for me (at least on the first machine that installed it) - is there any particular sequence required or a particular file? (In reply to comment #7) > Ok, since the backtrace contains no symbols, can you try > > addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x5ee02 0x4ba12 > > It doesn't tend to be as accurate as gdb, but it may help in the interim. > Note that this also requires the debug packages for Xorg and > xf86-video-intel to be installed. > > Simply loading aegisub doesn't crash for me (at least on the first machine > that installed it) - is there any particular sequence required or a > particular file? Can you tell me what are the required debug packages specificly? I'm running Arch linux. About the crash, it didn't just happen to a particular sequence or a particular file. Most of the time it happened when I was editing or reviewing subtitles. Also most of the video files I was working with were downloaded from youtube, so they were encoded with ffh264 for video, and ffaac for audio. Also my graphics card is haswell hd4400 if that helps. I found this https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces which suggests you can use "pacman -Qo /usr/lib/xorg/modules/drivers/intel_drv.so" to get the name of the package to rebuild with debug symbols. Then it looks like you need to manually tweak the package manifest to keep the debug symbols, rebuild and then reinstall the package. (In reply to comment #9) > I found this https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces > which suggests you can use "pacman -Qo > /usr/lib/xorg/modules/drivers/intel_drv.so" to get the name of the package > to rebuild with debug symbols. Then it looks like you need to manually tweak > the package manifest to keep the debug symbols, rebuild and then reinstall > the package. Ok, I reinstalled the self-compiled version of xf86-video-intel, now what? I mean, how do I get the debug info that you need? I have gdb installed, but not quite sure how to use it. With any luck the addresses are still valid, and addr2line (part of binutils) should work. To use gdb, requires a lot of heartache or a second computer. Using a second computer is much easier... Created attachment 96945 [details]
addr2line output
(In reply to comment #11) > With any luck the addresses are still valid, and addr2line (part of > binutils) should work. To use gdb, requires a lot of heartache or a second > computer. Using a second computer is much easier... Ok, I just attached the addr2line output. Hmm, I think my hope that the addresses will match up in the new package were false. Do you mind reproducing the crash with the new driver and attach the Xorg.0.log file? If you want to jump ahead to running addr2line, be my guest! You need to feed it the hex offsets (the part after the '+') for each intel_drv.so frame in the stack. Otherwise I can tell you what addr2line to execute, thanks. (In reply to comment #14) > Hmm, I think my hope that the addresses will match up in the new package > were false. Do you mind reproducing the crash with the new driver and attach > the Xorg.0.log file? If you want to jump ahead to running addr2line, be my > guest! You need to feed it the hex offsets (the part after the '+') for > each intel_drv.so frame in the stack. Otherwise I can tell you what > addr2line to execute, thanks. Wow...this is crazy. I've been playing with it for over half an hour now and it still haven't crashed. In this case it's really not a good thing. ;) Created attachment 96949 [details]
Xorg log with the self-compiled version of xf86-video-intel
(In reply to comment #14) > Hmm, I think my hope that the addresses will match up in the new package > were false. Do you mind reproducing the crash with the new driver and attach > the Xorg.0.log file? If you want to jump ahead to running addr2line, be my > guest! You need to feed it the hex offsets (the part after the '+') for > each intel_drv.so frame in the stack. Otherwise I can tell you what > addr2line to execute, thanks. Ok, it's finally crashed. Check out the Xorg log. Please run: addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x5ee02 0x4ba12 Created attachment 96953 [details]
addr2line ouput number 2
Seems the result is the same as last one. Did I do something wrong?
Ok, that is actually consistent against 2.99.911 (I looked against master originally where it didn't make sense). I'm still considering how it fails though, my first thought is that the region is broken due to an allocation failure which then causes REGION_NUM_RECTS() to die. (In reply to comment #20) > Ok, that is actually consistent against 2.99.911 (I looked against master > originally where it didn't make sense). I'm still considering how it fails > though, my first thought is that the region is broken due to an allocation > failure which then causes REGION_NUM_RECTS() to die. If you need anything else to help you figure out, just ask. Thank you! Bizarre, I still can't see how we would get a NULL pointer dereference there. Can you please look into changing the configure line within the package build script to include --enable-debug and reinstall? That will add assertion checks which might help catch the issue earlier. Most helpful would be to add --enable-debug=full, but that might prevent the bug entirely. (In reply to comment #22) > Bizarre, I still can't see how we would get a NULL pointer dereference > there. Can you please look into changing the configure line within the > package build script to include --enable-debug and reinstall? That will add > assertion checks which might help catch the issue earlier. Most helpful > would be to add --enable-debug=full, but that might prevent the bug entirely. Ok, I actually got the file, but it was too big to upload (2.5G), any suggestion? I uploaded the file to Dropbox. Here is the link: https://www.dropbox.com/s/eud5zatjsgyh09e/Xorg.0.log.old.zip Thanks, but I don't think that is the original bug... commit f5014b3fddf6c79f5ca01a91eec5ca92184c8829 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Apr 7 07:59:01 2014 +0100 sna: Avoid double application of pixel widening for degenerate lines References: https://bugs.freedesktop.org/show_bug.cgi?id=77074 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> will fix the assertion failure in the full-debug log. (In reply to comment #25) > Thanks, but I don't think that is the original bug... What?! Do you need me to do it again? I would just run with xf86-video-intel.git for a while and see if the bug reoccurs first. (In reply to comment #27) > I would just run with xf86-video-intel.git for a while and see if the bug > reoccurs first. Been running git version for a while. Did some subtitle editting in Aegisub, no crashes so far. Let's assume fixed until proven otherwise. Please do reopen if it dies again. Hi I am also seeing this bug.. Aegisub 3.0.4 Lubuntu 14.04 Fix was to change to UXA acceleration.. As advised: https://bugs.archlinux.org/task/39739 However Aegisub would crash xorg after 1+ mins of playback in standard SNA mode. Is there a upstream fix? (In reply to comment #30) > Hi I am also seeing this bug.. > Aegisub 3.0.4 > Lubuntu 14.04 > > Fix was to change to UXA acceleration.. As advised: > https://bugs.archlinux.org/task/39739 > > > However Aegisub would crash xorg after 1+ mins of playback in standard SNA > mode. > > Is there a upstream fix? YES! The bug you reopened is about the fix. Great! How quickly do the changes get pulled in to main stream? As I see this was fixed a few months ago.. Here is the crash log if your interested https://www.dropbox.com/s/8154w7p95kvq8le/_usr_bin_Xorg.0.crash Obviously not as quickly as I would like. Hi Christ, the problem seems to be back. I'm running Aegisub 3.2.0-2 and xf86-video-intel 2.99.914.60.gf546968-1 on Arch Linux. Are you sure? Have you captured the updated debug information? (In reply to comment #35) > Are you sure? Have you captured the updated debug information? Yes, it crashes like hell. I have to compile xf86-video-intel again. Will get the debug info for you later. It's weird that the Xorg log file does not even update...I re-compiled the xf86-video-intel-git package with options=(debug !strip), and it did crash. Do you know what I might be doing wrong? You are using journald which hides logfiles and requires new incarnations to retrieve? (In reply to comment #38) > You are using journald which hides logfiles and requires new incarnations to > retrieve? $ journalctl -b | grep -i 'intel_drv.so' Aug 21 15:11:55 archins3437 gdm-Xorg-:0[512]: (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so Aug 21 15:26:46 archins3437 gdm-Xorg-:0[512]: (EE) 3: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fc26e9ec000+0x61e0e) [0x7fc26ea4de0e] Aug 21 15:26:46 archins3437 gdm-Xorg-:0[512]: (EE) 4: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fc26e9ec000+0x4ddcf) [0x7fc26ea39dcf] Aug 21 15:26:47 archins3437 gdm-Xorg-:0[1545]: (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so Aug 21 15:26:49 archins3437 gdm-Xorg-:1[1563]: (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so $ addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x4ddcf 0x61e0e /home/tf/sandbox/xf86-video-intel-git/src/xf86-video-intel-git/src/sna/sna_damage.h:49 /home/tf/sandbox/xf86-video-intel-git/src/xf86-video-intel-git/src/sna/sna_accel.c:10757 /home/tf/sandbox/xf86-video-intel-git/src/xf86-video-intel-git/src/sna/sna_damage.c:65 /home/tf/sandbox/xf86-video-intel-git/src/xf86-video-intel-git/src/sna/sna_damage.c:661 Back to building with debug=full and extracting the full debug log then... (In reply to comment #41) > Back to building with debug=full and extracting the full debug log then... I forgot how to add debug=full to the Arch PKGBUILD file. Simply appending this into options=(debug=full !strip) didn't work. Can you manually add --enable-debug=full to the build script? Or just build it by hand, which I think will be very easy on arch anyway since all the headers are already there, and so you just need ./configure --prefix=/usr --enable-debug=full (In reply to comment #43) > Can you manually add --enable-debug=full to the build script? Or just build > it by hand, which I think will be very easy on arch anyway since all the > headers are already there, and so you just need ./configure --prefix=/usr > --enable-debug=full I don't know if I'm doing right. I added --enable-debug=full to the PKGBUILD file (see the attached file). I think it's working, 'cause my laptop is running heavily. Everything is slow. But I can't seem to get the log file I want. When I do "journalctl /usr/bin/Xorg", the latest log entries only goes to 2014-08-02. Created attachment 105036 [details]
PKGBUILD
Ok, step 1 complete. I have no idea how to use journald though. (In reply to comment #46) > Ok, step 1 complete. I have no idea how to use journald though. Yep, it's driving me nuts. See the attached file, generated with "journalctl -r --since="19:00" > xorg.log". It's the best I can do. There was a crash around 19:20. Created attachment 105040 [details]
xorg log file
I've built aegisub-3.2.0. Can you give me a crash course in reproducing the explosion? (In reply to comment #49) > I've built aegisub-3.2.0. Can you give me a crash course in reproducing the > explosion? It was an audio file that I was using with a format of flac. I also tried converting it to ac3, no difference. I didn't do too much. Simply working on the time frame would cause Xorg to crash. In my case it didn't take long. Now consider that I have never used aegisub before... (In reply to comment #51) > Now consider that I have never used aegisub before... Ok did a screencast on how I use it. Here is the dropbox link: https://www.dropbox.com/s/3182zruiprbph41/screencast.mkv?m= Hope it helps. I messed around a few times now, taking similar steps as you showed in the cast. Nothing so far. Could you try reproducing the crash with --enable-debug=full? It will be much slower, has a chance of hiding the bug, and will generate a huge logfile - but I don't have a better idea yet. (In reply to comment #53) > I messed around a few times now, taking similar steps as you showed in the > cast. Nothing so far. Could you try reproducing the crash with > --enable-debug=full? It will be much slower, has a chance of hiding the bug, > and will generate a huge logfile - but I don't have a better idea yet. I think I got the file. But it's too big, and Dropbox is blocked again here in my country. Google Drive won't work either. And surprisingly neither does OneDive. Any suggestions? How big is it after xz? Or try: head -1500 Xorg.0.log > xorg.trunc tail -3000 Xorg.0.log >> xorg.trunc and then compress it. Created attachment 105888 [details]
xorg debug log
Ok. Those commands are amazing. No need to compress anymore. Here it is.
(In reply to comment #56) > Created attachment 105888 [details] > xorg debug log > > Ok. Those commands are amazing. No need to compress anymore. Here it is. That looks like a u16 underflow. Do you have the stderr available? Usually /var/log/gdm/:0.log or similar. This should test the underflow theory: commit 30932a7b9d255c2037bee19e01aa3edc37b07386 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 8 12:41:06 2014 +0100 sna: Avoid u16 underflow when computing reserved batch space If we filled the batch exactly, then subtract -1 for the reserved BATCH_BUFFER_END, it would underflow to a large value - convincing us that we had sufficient room to stuff many, many more commands in. However, all the callsites should be guarded by checking already that they had sufficient space to emit at least one operation... References: https://bugs.freedesktop.org/show_bug.cgi?id=77074 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> The commit just makes it worse. It crashes my X even when I'm not using Aegisub. Cool! Is it possible for you to get a stacktrace or updated debug log? Please? Sigh. It tool loading aegisub for me, but it then promptly crashed: commit e0f7e9fc2f0b39b9e939ff48edea29950f125420 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 8 16:49:29 2014 +0100 sna: Initialise and check for batch space commit 30932a7b9d255c2037bee19e01aa3edc37b07386 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 8 12:41:06 2014 +0100 sna: Avoid u16 underflow when computing reserved batch space relied on gcc a little to much to warn me when I missed initialising 'rem' References: https://bugs.freedesktop.org/show_bug.cgi?id=77074 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Now any crashes should be interesting! Created attachment 105935 [details]
xorg debug log with the latest git commit.
Just tested the lasted git commit. X still crashed when using Aegisub. Here is the log file with full debug.
Hmm, but is it the same assertion? Can you please look for the login manager logs for the stderr from Xorg? (e.g. /var/log/gdm/:0.log) Also added a little more DBG messages before that assert: commit faf0bdd477b9ec73f943c3101a3ae30fd6d579ea Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Sep 9 07:36:40 2014 +0100 sna: Add some DBG spam for BLT boxes References: https://bugs.freedesktop.org/show_bug.cgi?id=77074 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (In reply to comment #62) > Hmm, but is it the same assertion? Can you please look for the login manager > logs for the stderr from Xorg? (e.g. /var/log/gdm/:0.log) I can't get the gdm logs, 'cause again it's not updated like what had with Xorg logs before. The reason that I could get the xorg logs was I disabled gdm and used startx to start my X, which is rootless this way, but not when using a display manager. On the VT you ran startx, and presumably switched back to after the crash, did you notice the line number of the assert (and what the assert was)? I doubt it will be a eureka moment, but hopefully will narrow down where to look next. (In reply to comment #65) > On the VT you ran startx, and presumably switched back to after the crash, > did you notice the line number of the assert (and what the assert was)? > > I doubt it will be a eureka moment, but hopefully will narrow down where to > look next. Been playing with Aegisub for half an hour, no crash so far. If you have a second computer (or smartphone and a ssh app), could you switch back to --enable-debug and run with gdb attached to Xorg via the remote connection? It is possible that the extra DBG is masking the bug, but catching the assertion would hopefully be enough information. Alternatively, we could try and find where systemd hides that information! Yeah, I have a smartphone, but you'll have to handhold me to finish the gdb stuff. But first I can check if the bug is still there by running Aegisub without --enable-debug, right? Yeah, hopefully the --enable-debug is just causing it to crash earlier and so help narrow down the root cause. Without --enable-debug, it should mostly work, or at least crash in the same location as before. (In reply to comment #69) > Yeah, hopefully the --enable-debug is just causing it to crash earlier and > so help narrow down the root cause. Without --enable-debug, it should mostly > work, or at least crash in the same location as before. yeah, it won't crash, and Aegisub feel kinda sluggish. I think I'm ready to try other options. If it feels slow, I would guess that X is using sw fallbacks (perhaps a GPU hang?). Anything in dmesg/Xorg.0.log? (If you can extract such from journald). Created attachment 106030 [details]
dmesg output
You can take a look at the dmesg output.
Ok, not a GPU hang. Could I check the Xorg.0.log equiv? (In reply to comment #72) > Created attachment 106030 [details] > dmesg output > > You can take a look at the dmesg output. (In reply to comment #73) > Ok, not a GPU hang. Could I check the Xorg.0.log equiv? No Xorg.o.log since I enabled gdm, which means X is running as root. But I just figured out why Xorg.0.log is not updated when X is running as root. Gonna do a test. Will report back. Btw, can I just install the stable version of xf86-video-intel and test with --enable--debug=full? 'Cause I'm afraid it still won't crash with the latest git commit. (In reply to comment #74) > Btw, can I just install the stable > version of xf86-video-intel and test with --enable--debug=full? 'Cause I'm > afraid it still won't crash with the latest git commit. And find the assert? Yes. Just annoying that you then won't have the extra debugging I added for this bug. | Created attachment 106034 [details]
previous gdm logs outputted with command "journalctl -r /usr/bin/gdm"
Previous gdm logs outputted with command "journalctl -r /usr/bin/gdm". Take a look. Now I know how to get latest Xorg logs using journalctl. Just need to get X to crash now.
Hmm, that didn't include the usual information I would expect from /var/log/gdm/:0.log. Perhaps try: journalctl -r /usr/bin/gdm + /usr/bin/gnome-shell + /usr/bin/Xorg Created attachment 106040 [details]
xorg log for today
This is the xorg log for today. No log entries for gnome shell.
Using DRI3+Present, it will feel sluggish. Nothing else stands out - I was looking for a message that it failed to submit some rendering and disabled acceleration, but that is absent. Created attachment 106046 [details]
journalctl -r -b -1 /usr/bin/Xorg.bin
Just had a crash. But still don't how to get the /var/log/gdm/:0.log file you need using journalctl.
Created attachment 106047 [details]
journalctl -r -b -1 /usr/bin/gdm
Is this the assertion you are looking for?
No. :( If you are building the crashy version from git, could you do: $ cd xf86-video-intel $ git fetch $ git cherry-pick 224af800f695b50ba5a65b5a2b9ca1e7a88d4e1a $ make && sudo make install That will (hopefully!) dump the assert to where journalctl /usr/bin/Xorg.bin will find it. (In reply to comment #82) > No. :( > > If you are building the crashy version from git, could you do: > > $ cd xf86-video-intel > $ git fetch > $ git cherry-pick 224af800f695b50ba5a65b5a2b9ca1e7a88d4e1a > $ make && sudo make install > > That will (hopefully!) dump the assert to where journalctl /usr/bin/Xorg.bin > will find it. I just used the xf86-video-intel-git from arch aur. Can you help figure out how to add the git commands to the PKGBUILD file? Here is the link: https://aur.archlinux.org/packages/xf/xf86-video-intel-git/PKGBUILD Created attachment 106052 [details]
PKGBUILD
I think this should do, just adding git fetch && git cherry-pick before the actual build.
(In reply to comment #84) > Created attachment 106052 [details] > PKGBUILD > > I think this should do, just adding git fetch && git cherry-pick before the > actual build. And I also add --enable-debug=full, then wait for it to crash, right? Wait with fingers crossed... Created attachment 106055 [details]
Build error
Had this error when building.
Hmm, it is building with the patch already included. Just scrub the two extra lines in PKGBUILD and press onwards. Created attachment 106058 [details]
Latest xorg log
X crashed and here is the log. Tell me if you find what you need.
No. It is the right build, so it should be dumping the FatalError on asserts now. What's odd in the log file is: Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_wakeup_handler Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_accel_do_throttle -- no pending activity Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: has_shadow: has pending damage? 0, outstanding flips: 0 Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_block_handler (tv=597.543000) Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_wakeup_handler Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_do_throttle (time=528700), triggered Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_scanout_do_flush: flush timer active: delta=15 Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_block_handler (tv=-1.0) Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_flush: flush?=0, dirty?=0 Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_flush: flush?=0, dirty?=0 Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_wakeup_handler: nbatch=0, need_retire=0, need_purge=0 Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_wakeup_handler There is a 30 second period where one X disappears and new one starts up without that transition being logged at all. (The log begins with usual startup for Xorg-:0[541], but nothing at all is visible for Xorg-:1[1187]) ARGH. (In reply to comment #90) > No. It is the right build, so it should be dumping the FatalError on asserts > now. What's odd in the log file is: > > Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_wakeup_handler > Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_accel_do_throttle -- no > pending activity > Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: has_shadow: has pending > damage? 0, outstanding flips: 0 > Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_block_handler > (tv=597.543000) > Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_wakeup_handler > Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_do_throttle > (time=528700), triggered > Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_scanout_do_flush: flush > timer active: delta=15 > Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_block_handler (tv=-1.0) > Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_flush: flush?=0, > dirty?=0 > Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_flush: flush?=0, > dirty?=0 > Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_wakeup_handler: > nbatch=0, need_retire=0, need_purge=0 > Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_wakeup_handler > > There is a 30 second period where one X disappears and new one starts up > without that transition being logged at all. (The log begins with usual > startup for Xorg-:0[541], but nothing at all is visible for Xorg-:1[1187]) > > ARGH. Maybe I don't use GDM to start my X and just use startx to start a rootless X and when the crash happens, the assert should be displayed on the VT? Will this do? (In reply to comment #91) > Maybe I don't use GDM to start my X and just use startx to start a rootless > X and when the crash happens, the assert should be displayed on the VT? Will > this do? Yes. All I want (today!) is that fatal error message telling me which line to look at. If you use startx, it should be visible immediately after X crashes. (In reply to comment #92) > (In reply to comment #91) > > Maybe I don't use GDM to start my X and just use startx to start a rootless > > X and when the crash happens, the assert should be displayed on the VT? Will > > this do? > > Yes. All I want (today!) is that fatal error message telling me which line > to look at. If you use startx, it should be visible immediately after X > crashes. OK. Fingers crossed. Created attachment 106061 [details]
Photos of the error message.
Here you go. Pls tell me it's what need. Otherwise you and me both are gonna be crazy.
Created attachment 106062 [details]
left screen
Photos of the error message.
Created attachment 106063 [details]
right screen
Photos of the error message.
Erf. That's weird. However, look in ~/.local/share/xorg/Xorg.0.log* Created attachment 106064 [details]
Xorg.0.log.old with head -1500 and tail -3000
OK tell me this is it. I saw "sna_damage_add:48 assertion '!DAMAGE_IS_ALL(*damage)' failed"
Yes! We have finally struck gold! (In reply to comment #99) > Yes! We have finally struck gold! Yay! Glad we got this son of ***. And I desperately need my sleep. Let me know if you need more debug info. Thanks. Scratches head. I doubt that's the original bug, but here's the fix for that assert: commit 9b25eeee85d32223841640c3a39901e4b63707ce Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Sep 10 16:37:16 2014 +0100 sna: Do apply damage twice for miSpans.PolyFillRect As the caller will apply the damage afterwards, we do not need to do the accumulation in the miSpans callbacks and it presumes that its damage is unaltered. References: https://bugs.freedesktop.org/show_bug.cgi?id=77074 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (In reply to comment #101) > Scratches head. > > I doubt that's the original bug, but here's the fix for that assert: > > commit 9b25eeee85d32223841640c3a39901e4b63707ce > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Wed Sep 10 16:37:16 2014 +0100 > > sna: Do apply damage twice for miSpans.PolyFillRect > > As the caller will apply the damage afterwards, we do not need to do the > accumulation in the miSpans callbacks and it presumes that its damage is > unaltered. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=77074 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Thanks, will test it out. Btw, should I disable DRI3? 'Cause I read on Arch news that there are multiple rendering bugs, so it's disabled by default in the xf86-video-intel of their official repo. Created attachment 106093 [details]
Xorg log without --enable-debug=full
OK. It crashed again.
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) Caught signal 11 (Segmentation fault). Server aborting Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: Fatal server error: Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) Segmentation fault at address 0x0 Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 11: /usr/bin/Xorg.bin (0x400000+0x25d0e) [0x425d0e] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 10: /usr/lib/libc.so.6 (__libc_start_main+0xf0) [0x7f2dca9e0000] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 9: /usr/bin/Xorg.bin (0x400000+0x3b866) [0x43b866] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 8: /usr/bin/Xorg.bin (0x400000+0x376d7) [0x4376d7] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 7: /usr/bin/Xorg.bin (0x400000+0x33caa) [0x433caa] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 6: /usr/bin/Xorg.bin (0x400000+0x11c74a) [0x51c74a] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 5: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f2dc5c08000+0x4f08b) [0x7f2dc5c5708b] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 4: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f2dc5c08000+0x63c26) [0x7f2dc5c6bc26] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 3: /usr/lib/libpixman-1.so.0 (pixman_region_fini+0x9) [0x7f2dcb91b879] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 2: /usr/lib/libc.so.6 (0x7f2dca9c0000+0x33df0) [0x7f2dca9f3df0] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 1: /usr/bin/Xorg.bin (0x400000+0x197b69) [0x597b69] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 0: /usr/bin/Xorg.bin (xorg_backtrace+0x56) [0x593966] Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) Backtrace: Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) Hmm, no --enable-debug at all? Could you run addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x4f08b 0x63c26 addr2line -e /usr/bin/Xorg.bin -i 0x3b866 0x376d7 0x33caa 0x11c74a and then rebuild with --enable-debug again. (In reply to comment #102) > Thanks, will test it out. Btw, should I disable DRI3? 'Cause I read on Arch > news that there are multiple rendering bugs, so it's disabled by default in > the xf86-video-intel of their official repo. It's disabled because it is incomplete and lacks true synchronisation with X (and compositors) hence resulting in delayed rendering (though it should be mostly correct rendering just at the wrong time). [tf@archins3437 ~]$ addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x4f08b 0x63c26 ??:0 ??:0 [tf@archins3437 ~]$ addr2line -e /usr/bin/Xorg.bin -i 0x3b866 0x376d7 0x33caa 0x11c74a ??:0 ??:0 ??:0 ??:0 What's wrong? Will build with debug=full again. Oh I forgot, Arch completely strips all debug symbols from its installs (and doesn't install the separate dso.debug). There's probably a way to prevent that if you know Arch well enough, but let's hope it presents itself again with --enable-debug=full :| (In reply to comment #107) > Oh I forgot, Arch completely strips all debug symbols from its installs (and > doesn't install the separate dso.debug). There's probably a way to prevent > that if you know Arch well enough, but let's hope it presents itself again > with --enable-debug=full :| I have putted "!strip" in the options with this build. Let's hope it crashes soon enough. Created attachment 106105 [details]
Xorg log without --enable-debug=full
Sep 11 14:51:04 archins3437 gdm-Xorg-:0[1247]: (II) UnloadModule: "evdev" Sep 11 14:51:04 archins3437 gdm-Xorg-:0[1247]: (II) evdev: Dell WMI hotkeys: Close Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: _sna_blt_fill_boxes: ffffff x 1 Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: box_from_seg: seg=(1,0),(1,124); box=(1,0),(2,124) Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: __kgem_bo_mark_dirty: handle=19 (proxy? 0) Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: kgem_add_handle: handle=19, index=0 Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: kgem_add_reloc: handle=19, pos=4, delta=0, domains=28002 You wouldn't happen to have the missing 30s? /o\ Looking at the logfile it appears that journald only outputs a small amount of the log every 30s. I think we will just have to rely on --enable-debug (not --enable-debug=full). (In reply to comment #111) > Looking at the logfile it appears that journald only outputs a small amount > of the log every 30s. I think we will just have to rely on --enable-debug > (not --enable-debug=full). OK. Let's just say GDM and journald suck. Or should I try again with startx? If you have time to run with --enable-debug=full, please, please do so. :) In which case use startx whilst on a debugging run. Created attachment 106109 [details]
Xorg.0.log.old (2014-09-11T16-06) with head -1500 and tail -3000
I saw this: sna_damage_add_to_pixmap:75 assertion '!DAMAGE_IS_ALL(*damage)' failed
Still the head scratcher from last night, just a second place along the same path that also modified the damage. commit 797369449b87cbd578f9fb96f34b065e548755f6 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Sep 11 09:37:11 2014 +0100 sna: Do not mark the pixmap as cleared in the middle of a miSpans decomposition As the miSpans will continue to overdraw the Pixmap, it's final state will no longer be that clear value. We need to be much more careful when allowing that optimisation. Reported-by: Tyler Foo <tftylerfoo@gmail.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=77074 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> That should finally put that to rest - it could be the cause of death, but it might not be... (In reply to comment #115) > Still the head scratcher from last night, just a second place along the same > path that also modified the damage. > > commit 797369449b87cbd578f9fb96f34b065e548755f6 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Thu Sep 11 09:37:11 2014 +0100 > > sna: Do not mark the pixmap as cleared in the middle of a miSpans > decomposition > > As the miSpans will continue to overdraw the Pixmap, it's final state > will no longer be that clear value. We need to be much more careful when > allowing that optimisation. > > Reported-by: Tyler Foo <tftylerfoo@gmail.com> > References: https://bugs.freedesktop.org/show_bug.cgi?id=77074 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > That should finally put that to rest - it could be the cause of death, but > it might not be... OK. I'll test a little more when I have time. Any news? (In reply to comment #117) > Any news? I've used aegisub a couple times. Nothing so far. I am going to claim its fixed - that's bound to provoke it into failing again! Created attachment 108375 [details]
dmesg
Hey Chris, my computer had multiple freezes this month. Can you take look at the dmesg to see if it has something to do with this bug?
dmesg and journalctl output attached.
Created attachment 108376 [details]
journalctl output
If this has nothing to do with this bug, I'll file a separate bug instead. Yes, that is a separate (kernel) bug. We have had a few reports like that and we put a workaround into 3.17 to reduce the impact, but we don't know what's causing it yet. (In reply to Chris Wilson from comment #123) > Yes, that is a separate (kernel) bug. We have had a few reports like that > and we put a workaround into 3.17 to reduce the impact, but we don't know > what's causing it yet. Ok, this is great and bad. Good that it's not this bug again. Bad that we don't have a solution yet and it's really hard to reproduce. It just happens. So will downgrading the kernel help? Or should I downgrading the xf86-video-intel driver? I really need a working system these days. As far as we know, this bug was introduced in kernel 3.16. I think this problem is fixed, as I encountered same issue on OpenBSD with GIMP. I had very easy repro case by just opening new file and navigating file browser in the recently open files. Crash each time of Xorg. More details reported here: https://marc.info/?l=openbsd-bugs&m=154706833406795&w=2 GDB details from openbsd-bugs email: (gdb) bt #0 0x00000aeb3630ff3a in sna_blt_copy_boxes (sna=0xaeb33262000, alu=3 '\003', src_bo=0xaeb79f86400, src_dx=0, src_dy=0, dst_bo=0xaeb79f8a200, dst_dx=0, dst_dy=0, bpp=32, box=0xaeb63870000, nbox=0) at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_blt.c:3759 #1 0x00000aeb363544e9 in no_render_copy_boxes (sna=0xaeb33262000, alu=3 '\003', src=0xaeb7ab1b080, src_bo=0xaeb79f86400, src_dx=0, src_dy=0, dst=0xaeb7ab1b080, dst_bo=0xaeb79f8a200, dst_dx=0, dst_dy=0, box=0xaeb63868010, n=2038, flags=0) at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_render.c:137 #2 0x00000aeb362d2907 in sna_pixmap_move_to_gpu (pixmap=0xaeb7ab1b080, flags=10) at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_accel.c:4246 #3 0x00000aeb362f375a in sna_copy_boxes (src=0xaeb7ab1b080, dst=0xaeb1507e400, gc=0xaeacb235a00, region=0x7f7ffffe9750, dx=-616, dy=-72, bitplane=0, closure=0x0) at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_accel.c:6387 #4 0x00000aeb362f5122 in sna_do_copy (src=0xaeb7ab1b080, dst=0xaeb1507e400, gc=0xaeacb235a00, sx=0, sy=0, width=1535, height=1012, dx=616, dy=72, copy=0xaeb362f2f00 <sna_copy_boxes>, bitPlane=0, closure=0x0) at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_accel.c:6959 #5 0x00000aeb362dd3c7 in sna_copy_area (src=0xaeb7ab1b080, dst=0xaeb1507e400, gc=0xaeacb235a00, src_x=0, src_y=0, width=1535, height=1012, dst_x=245, dst_y=71) at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_accel.c:7041 #6 0x00000ae8a1bdd17d in damageCopyArea (pSrc=0xaeb7ab1b080, pDst=0xaeb1507e400, pGC=0xaeacb235a00, srcx=0, srcy=0, width=1535, height=1012, dstx=245, dsty=71) at /home/mkucharski/openbsd/xenocara/xserver/miext/damage/damage.c:775 #7 0x00000ae8a1a4728a in ProcCopyArea (client=0xaeb6c1f3800) at /home/mkucharski/openbsd/xenocara/xserver/dix/dispatch.c:1722 #8 0x00000ae8a1a41df0 in Dispatch () at /home/mkucharski/openbsd/xenocara/xserver/dix/dispatch.c:480 #9 0x00000ae8a1a55479 in dix_main (argc=7, argv=0x7f7ffffe9b18, envp=0x7f7ffffe9b58) at /home/mkucharski/openbsd/xenocara/xserver/dix/main.c:287 #10 0x00000ae8a1a2e357 in main (argc=7, argv=0x7f7ffffe9b18, envp=0x7f7ffffe9b58) at /home/mkucharski/openbsd/xenocara/xserver/dix/stubmain.c:34 (gdb) list /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_blt.c:3759 3754 3755 assert(box->x1 >= 0); 3756 assert(box->y1 >= 0); 3757 3758 *(uint64_t *)&b[0] = hdr; 3759 *(uint64_t *)&b[2] = *(const uint64_t *)box; 3760 *(uint64_t *)(b+4) = 3761 kgem_add_reloc64(kgem, kgem->nbatch + 4, dst_bo, 3762 I915_GEM_DOMAIN_RENDER << 16 | 3763 I915_GEM_DOMAIN_RENDER | ... (gdb) print box $2 = (const BoxRec *) 0xaeb63870000 (gdb) print *(const uint64_t *)box Cannot access memory at address 0xaeb63870000 ... (gdb) print *(const uint64_t *) 0xaeb63870000 Cannot access memory at address 0xaeb63870000 (gdb) print *(const uint64_t *) 0xaeb63868010 $5 = 568481871298560 What I see in above backtrace, inside sna_blt_copy_boxes() box=0xaeb63870000, however in no_render_copy_boxes() box=0xaeb63868010 and that results Xorg crash when accessing box variable. (gdb) bt #0 0x00000aeb3630ff3a in sna_blt_copy_boxes (sna=0xaeb33262000, alu=3 '\003', src_bo=0xaeb79f86400, src_dx=0, src_dy=0, dst_bo=0xaeb79f8a200, dst_dx=0, dst_dy=0, bpp=32, box=0xaeb63870000, nbox=0) ^^^^^^^^^^^^^^^^^ at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_blt.c:3759 #1 0x00000aeb363544e9 in no_render_copy_boxes (sna=0xaeb33262000, alu=3 '\003', src=0xaeb7ab1b080, src_bo=0xaeb79f86400, src_dx=0, src_dy=0, dst=0xaeb7ab1b080, dst_bo=0xaeb79f8a200, dst_dx=0, dst_dy=0, box=0xaeb63868010, n=2038, flags=0) ^^^^^^^^^^^^^^^^^ at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_render.c:137 ... Yesterday I've compiled e5ff8e1828f97891c819c919d7115c6e18b2eb1f from https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel.git and only problem on the way was bugzilla id 109268 (byteswap.h not available on OpenBSD) and the crash is gone with latest code of xf86-video-intel the driver. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.