Bug 77074 - Xorg crashes while using Aegisub
Summary: Xorg crashes while using Aegisub
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-05 02:50 UTC by Tyler Foo
Modified: 2019-01-10 12:48 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
journalctl output of aegisub (9.70 KB, text/plain)
2014-04-05 02:54 UTC, Tyler Foo
no flags Details
Xorg.0.log.old file when the crash happened (27.49 KB, text/plain)
2014-04-05 07:53 UTC, Tyler Foo
no flags Details
Xorg.0.log file after the crash (25.90 KB, text/plain)
2014-04-05 07:54 UTC, Tyler Foo
no flags Details
addr2line output (297 bytes, text/plain)
2014-04-05 11:30 UTC, Tyler Foo
no flags Details
Xorg log with the self-compiled version of xf86-video-intel (27.29 KB, text/plain)
2014-04-05 12:45 UTC, Tyler Foo
no flags Details
addr2line ouput number 2 (297 bytes, text/plain)
2014-04-05 14:07 UTC, Tyler Foo
no flags Details
PKGBUILD (1.62 KB, text/plain)
2014-08-21 12:28 UTC, Tyler Foo
no flags Details
xorg log file (1.88 MB, text/plain)
2014-08-21 12:52 UTC, Tyler Foo
no flags Details
xorg debug log (298.75 KB, text/plain)
2014-09-08 10:52 UTC, Tyler Foo
no flags Details
xorg debug log with the latest git commit. (301.00 KB, text/plain)
2014-09-09 00:10 UTC, Tyler Foo
no flags Details
dmesg output (57.70 KB, text/plain)
2014-09-10 08:32 UTC, Tyler Foo
no flags Details
previous gdm logs outputted with command "journalctl -r /usr/bin/gdm" (105.04 KB, text/plain)
2014-09-10 09:35 UTC, Tyler Foo
no flags Details
xorg log for today (516.24 KB, text/plain)
2014-09-10 09:52 UTC, Tyler Foo
no flags Details
journalctl -r -b -1 /usr/bin/Xorg.bin (1002.91 KB, text/plain)
2014-09-10 10:16 UTC, Tyler Foo
no flags Details
journalctl -r -b -1 /usr/bin/gdm (1.34 KB, text/plain)
2014-09-10 10:19 UTC, Tyler Foo
no flags Details
PKGBUILD (1.65 KB, text/plain)
2014-09-10 11:10 UTC, Chris Wilson
no flags Details
Build error (419 bytes, text/plain)
2014-09-10 12:43 UTC, Tyler Foo
no flags Details
Latest xorg log (316.21 KB, text/plain)
2014-09-10 13:10 UTC, Tyler Foo
no flags Details
Photos of the error message. (1.13 MB, image/jpeg)
2014-09-10 14:15 UTC, Tyler Foo
no flags Details
left screen (1.65 MB, image/jpeg)
2014-09-10 14:16 UTC, Tyler Foo
no flags Details
right screen (1.48 MB, image/jpeg)
2014-09-10 14:16 UTC, Tyler Foo
no flags Details
Xorg.0.log.old with head -1500 and tail -3000 (307.92 KB, text/plain)
2014-09-10 15:19 UTC, Tyler Foo
no flags Details
Xorg log without --enable-debug=full (127.92 KB, text/plain)
2014-09-10 23:39 UTC, Tyler Foo
no flags Details
Xorg log without --enable-debug=full (651.82 KB, text/plain)
2014-09-11 06:57 UTC, Tyler Foo
no flags Details
Xorg.0.log.old (2014-09-11T16-06) with head -1500 and tail -3000 (311.98 KB, text/plain)
2014-09-11 08:11 UTC, Tyler Foo
no flags Details
dmesg (71.67 KB, text/plain)
2014-10-24 23:36 UTC, Tyler Foo
no flags Details
journalctl output (2.69 MB, text/plain)
2014-10-24 23:37 UTC, Tyler Foo
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tyler Foo 2014-04-05 02:50:44 UTC

    
Comment 1 Tyler Foo 2014-04-05 02:54:12 UTC
Created attachment 96929 [details]
journalctl output of aegisub

Output from "journaltcl | grep -i 'aegisub'"
Comment 2 Tyler Foo 2014-04-05 02:56:00 UTC
This is only happens when I set "AccelMethod" to "sna".
Comment 3 Chris Wilson 2014-04-05 06:17:43 UTC
I at least need the Xorg.0.log containing the stacktrace. Preferrably it will have its symbols resolved. As you can reproduce this, please try and attach gdb.
Comment 4 Tyler Foo 2014-04-05 07:53:03 UTC
Created attachment 96934 [details]
Xorg.0.log.old file when the crash happened
Comment 5 Tyler Foo 2014-04-05 07:54:52 UTC
Created attachment 96935 [details]
Xorg.0.log file after the crash
Comment 6 Tyler Foo 2014-04-05 07:57:06 UTC
(In reply to comment #3)
> I at least need the Xorg.0.log containing the stacktrace. Preferrably it
> will have its symbols resolved. As you can reproduce this, please try and
> attach gdb.

Sorry, this seems quite complicated to me, but when I have time, I'll certainly try it. In the mean time, pls check the Xorg.0.log.old file. Thanks.
Comment 7 Chris Wilson 2014-04-05 09:15:18 UTC
Ok, since the backtrace contains no symbols, can you try

addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x5ee02 0x4ba12

It doesn't tend to be as accurate as gdb, but it may help in the interim. Note that this also requires the debug packages for Xorg and xf86-video-intel to be installed.

Simply loading aegisub doesn't crash for me (at least on the first machine that installed it) - is there any particular sequence required or a particular file?
Comment 8 Tyler Foo 2014-04-05 09:43:07 UTC
(In reply to comment #7)
> Ok, since the backtrace contains no symbols, can you try
> 
> addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x5ee02 0x4ba12
> 
> It doesn't tend to be as accurate as gdb, but it may help in the interim.
> Note that this also requires the debug packages for Xorg and
> xf86-video-intel to be installed.
> 
> Simply loading aegisub doesn't crash for me (at least on the first machine
> that installed it) - is there any particular sequence required or a
> particular file?

Can you tell me what are the required debug packages specificly? I'm running Arch linux.

About the crash, it didn't just happen to a particular sequence or a particular file. Most of the time it happened when I was editing or reviewing subtitles. Also most of the video files I was working with were downloaded from youtube, so they were encoded with ffh264 for video, and ffaac for audio.

Also my graphics card is haswell hd4400 if that helps.
Comment 9 Chris Wilson 2014-04-05 09:49:06 UTC
I found this https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces which suggests you can use "pacman -Qo /usr/lib/xorg/modules/drivers/intel_drv.so" to get the name of the package to rebuild with debug symbols. Then it looks like you need to manually tweak the package manifest to keep the debug symbols, rebuild and then reinstall the package.
Comment 10 Tyler Foo 2014-04-05 11:19:14 UTC
(In reply to comment #9)
> I found this https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces
> which suggests you can use "pacman -Qo
> /usr/lib/xorg/modules/drivers/intel_drv.so" to get the name of the package
> to rebuild with debug symbols. Then it looks like you need to manually tweak
> the package manifest to keep the debug symbols, rebuild and then reinstall
> the package.

Ok, I reinstalled the self-compiled version of xf86-video-intel, now what? I mean, how do I get the debug info that you need? I have gdb installed, but not quite sure how to use it.
Comment 11 Chris Wilson 2014-04-05 11:26:14 UTC
With any luck the addresses are still valid, and addr2line (part of binutils) should work. To use gdb, requires a lot of heartache or a second computer. Using a second computer is much easier...
Comment 12 Tyler Foo 2014-04-05 11:30:47 UTC
Created attachment 96945 [details]
addr2line output
Comment 13 Tyler Foo 2014-04-05 11:31:27 UTC
(In reply to comment #11)
> With any luck the addresses are still valid, and addr2line (part of
> binutils) should work. To use gdb, requires a lot of heartache or a second
> computer. Using a second computer is much easier...

Ok, I just attached the addr2line output.
Comment 14 Chris Wilson 2014-04-05 11:43:48 UTC
Hmm, I think my hope that the addresses will match up in the new package were false. Do you mind reproducing the crash with the new driver and attach the Xorg.0.log file? If you want to jump ahead to running addr2line, be my guest! You need to feed it the hex offsets (the part after the '+')  for each intel_drv.so frame in the stack. Otherwise I can tell you what addr2line to execute, thanks.
Comment 15 Tyler Foo 2014-04-05 12:31:08 UTC
(In reply to comment #14)
> Hmm, I think my hope that the addresses will match up in the new package
> were false. Do you mind reproducing the crash with the new driver and attach
> the Xorg.0.log file? If you want to jump ahead to running addr2line, be my
> guest! You need to feed it the hex offsets (the part after the '+')  for
> each intel_drv.so frame in the stack. Otherwise I can tell you what
> addr2line to execute, thanks.

Wow...this is crazy. I've been playing with it for over half an hour now and it still haven't crashed. In this case it's really not a good thing. ;)
Comment 16 Tyler Foo 2014-04-05 12:45:25 UTC
Created attachment 96949 [details]
Xorg log with the self-compiled version of xf86-video-intel
Comment 17 Tyler Foo 2014-04-05 12:46:43 UTC
(In reply to comment #14)
> Hmm, I think my hope that the addresses will match up in the new package
> were false. Do you mind reproducing the crash with the new driver and attach
> the Xorg.0.log file? If you want to jump ahead to running addr2line, be my
> guest! You need to feed it the hex offsets (the part after the '+')  for
> each intel_drv.so frame in the stack. Otherwise I can tell you what
> addr2line to execute, thanks.

Ok, it's finally crashed. Check out the Xorg log.
Comment 18 Chris Wilson 2014-04-05 13:50:54 UTC
Please run:

addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x5ee02 0x4ba12
Comment 19 Tyler Foo 2014-04-05 14:07:08 UTC
Created attachment 96953 [details]
addr2line ouput number 2

Seems the result is the same as last one. Did I do something wrong?
Comment 20 Chris Wilson 2014-04-05 20:24:55 UTC
Ok, that is actually consistent against 2.99.911 (I looked against master originally where it didn't make sense). I'm still considering how it fails though, my first thought is that the region is broken due to an allocation failure which then causes REGION_NUM_RECTS() to die.
Comment 21 Tyler Foo 2014-04-06 00:56:03 UTC
(In reply to comment #20)
> Ok, that is actually consistent against 2.99.911 (I looked against master
> originally where it didn't make sense). I'm still considering how it fails
> though, my first thought is that the region is broken due to an allocation
> failure which then causes REGION_NUM_RECTS() to die.

If you need anything else to help you figure out, just ask. Thank you!
Comment 22 Chris Wilson 2014-04-06 19:29:57 UTC
Bizarre, I still can't see how we would get a NULL pointer dereference there. Can you please look into changing the configure line within the package build script to include --enable-debug and reinstall? That will add assertion checks which might help catch the issue earlier. Most helpful would be to add --enable-debug=full, but that might prevent the bug entirely.
Comment 23 Tyler Foo 2014-04-07 04:03:17 UTC
(In reply to comment #22)
> Bizarre, I still can't see how we would get a NULL pointer dereference
> there. Can you please look into changing the configure line within the
> package build script to include --enable-debug and reinstall? That will add
> assertion checks which might help catch the issue earlier. Most helpful
> would be to add --enable-debug=full, but that might prevent the bug entirely.

Ok, I actually got the file, but it was too big to upload (2.5G), any suggestion?
Comment 24 Tyler Foo 2014-04-07 05:32:02 UTC
I uploaded the file to Dropbox. Here is the link: https://www.dropbox.com/s/eud5zatjsgyh09e/Xorg.0.log.old.zip
Comment 25 Chris Wilson 2014-04-07 07:02:28 UTC
Thanks, but I don't think that is the original bug...

commit f5014b3fddf6c79f5ca01a91eec5ca92184c8829
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 7 07:59:01 2014 +0100

    sna: Avoid double application of pixel widening for degenerate lines
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=77074
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

will fix the assertion failure in the full-debug log.
Comment 26 Tyler Foo 2014-04-07 07:42:41 UTC
(In reply to comment #25)
> Thanks, but I don't think that is the original bug...

What?! Do you need me to do it again?
Comment 27 Chris Wilson 2014-04-07 07:47:49 UTC
I would just run with xf86-video-intel.git for a while and see if the bug reoccurs first.
Comment 28 Tyler Foo 2014-04-07 14:32:08 UTC
(In reply to comment #27)
> I would just run with xf86-video-intel.git for a while and see if the bug
> reoccurs first.

Been running git version for a while. Did some subtitle editting in Aegisub, no crashes so far.
Comment 29 Chris Wilson 2014-04-09 17:20:14 UTC
Let's assume fixed until proven otherwise. Please do reopen if it dies again.
Comment 30 dj.dill 2014-06-15 12:01:56 UTC
Hi I am also seeing this bug.. 
Aegisub 3.0.4 
Lubuntu 14.04 

Fix was to change to UXA acceleration.. As advised:
https://bugs.archlinux.org/task/39739


However Aegisub would crash xorg after 1+ mins of playback in standard SNA mode.

Is there a upstream fix?
Comment 31 Chris Wilson 2014-06-15 12:10:16 UTC
(In reply to comment #30)
> Hi I am also seeing this bug.. 
> Aegisub 3.0.4 
> Lubuntu 14.04 
> 
> Fix was to change to UXA acceleration.. As advised:
> https://bugs.archlinux.org/task/39739
> 
> 
> However Aegisub would crash xorg after 1+ mins of playback in standard SNA
> mode.
> 
> Is there a upstream fix?

YES! The bug you reopened is about the fix.
Comment 32 dj.dill 2014-06-15 12:28:03 UTC
Great! How quickly do the changes get pulled in to main stream? 
As I see this was fixed a few months ago..

Here is the crash log if your interested 
https://www.dropbox.com/s/8154w7p95kvq8le/_usr_bin_Xorg.0.crash
Comment 33 Chris Wilson 2014-06-15 12:32:00 UTC
Obviously not as quickly as I would like.
Comment 34 Tyler Foo 2014-08-21 05:17:51 UTC
Hi Christ, the problem seems to be back.

I'm running Aegisub 3.2.0-2 and xf86-video-intel 2.99.914.60.gf546968-1 on Arch Linux.
Comment 35 Chris Wilson 2014-08-21 05:35:49 UTC
Are you sure? Have you captured the updated debug information?
Comment 36 Tyler Foo 2014-08-21 05:43:57 UTC
(In reply to comment #35)
> Are you sure? Have you captured the updated debug information?

Yes, it crashes like hell. I have to compile xf86-video-intel again. Will get the debug info for you later.
Comment 37 Tyler Foo 2014-08-21 07:00:46 UTC
It's weird that the Xorg log file does not even update...I re-compiled the xf86-video-intel-git package with options=(debug !strip), and it did crash. Do you know what I might be doing wrong?
Comment 38 Chris Wilson 2014-08-21 07:11:13 UTC
You are using journald which hides logfiles and requires new incarnations to retrieve?
Comment 39 Tyler Foo 2014-08-21 07:28:22 UTC
(In reply to comment #38)
> You are using journald which hides logfiles and requires new incarnations to
> retrieve?
$ journalctl -b | grep -i 'intel_drv.so'
Aug 21 15:11:55 archins3437 gdm-Xorg-:0[512]: (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
Aug 21 15:26:46 archins3437 gdm-Xorg-:0[512]: (EE) 3: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fc26e9ec000+0x61e0e) [0x7fc26ea4de0e]
Aug 21 15:26:46 archins3437 gdm-Xorg-:0[512]: (EE) 4: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7fc26e9ec000+0x4ddcf) [0x7fc26ea39dcf]
Aug 21 15:26:47 archins3437 gdm-Xorg-:0[1545]: (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
Aug 21 15:26:49 archins3437 gdm-Xorg-:1[1563]: (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
Comment 40 Tyler Foo 2014-08-21 07:30:13 UTC
$ addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x4ddcf 0x61e0e
/home/tf/sandbox/xf86-video-intel-git/src/xf86-video-intel-git/src/sna/sna_damage.h:49
/home/tf/sandbox/xf86-video-intel-git/src/xf86-video-intel-git/src/sna/sna_accel.c:10757
/home/tf/sandbox/xf86-video-intel-git/src/xf86-video-intel-git/src/sna/sna_damage.c:65
/home/tf/sandbox/xf86-video-intel-git/src/xf86-video-intel-git/src/sna/sna_damage.c:661
Comment 41 Chris Wilson 2014-08-21 07:33:40 UTC
Back to building with debug=full and extracting the full debug log then...
Comment 42 Tyler Foo 2014-08-21 07:44:17 UTC
(In reply to comment #41)
> Back to building with debug=full and extracting the full debug log then...

I forgot how to add debug=full to the Arch PKGBUILD file. Simply appending this into options=(debug=full !strip) didn't work.
Comment 43 Chris Wilson 2014-08-21 07:59:49 UTC
Can you manually add --enable-debug=full to the build script? Or just build it by hand, which I think will be very easy on arch anyway since all the headers are already there, and so you just need ./configure --prefix=/usr --enable-debug=full
Comment 44 Tyler Foo 2014-08-21 12:26:50 UTC
(In reply to comment #43)
> Can you manually add --enable-debug=full to the build script? Or just build
> it by hand, which I think will be very easy on arch anyway since all the
> headers are already there, and so you just need ./configure --prefix=/usr
> --enable-debug=full

I don't know if I'm doing right. I added --enable-debug=full to the PKGBUILD file (see the attached file). I think it's working, 'cause my laptop is running heavily. Everything is slow. But I can't seem to get the log file I want. When I do "journalctl /usr/bin/Xorg", the latest log entries only goes to 2014-08-02.
Comment 45 Tyler Foo 2014-08-21 12:28:15 UTC
Created attachment 105036 [details]
PKGBUILD
Comment 46 Chris Wilson 2014-08-21 12:30:47 UTC
Ok, step 1 complete. I have no idea how to use journald though.
Comment 47 Tyler Foo 2014-08-21 12:51:38 UTC
(In reply to comment #46)
> Ok, step 1 complete. I have no idea how to use journald though.

Yep, it's driving me nuts. See the attached file, generated with "journalctl -r --since="19:00" > xorg.log". It's the best I can do. There was a crash around 19:20.
Comment 48 Tyler Foo 2014-08-21 12:52:18 UTC
Created attachment 105040 [details]
xorg log file
Comment 49 Chris Wilson 2014-08-22 06:20:59 UTC
I've built aegisub-3.2.0. Can you give me a crash course in reproducing the explosion?
Comment 50 Tyler Foo 2014-08-22 06:37:42 UTC
(In reply to comment #49)
> I've built aegisub-3.2.0. Can you give me a crash course in reproducing the
> explosion?

It was an audio file that I was using with a format of flac. I also tried converting it to ac3, no difference. I didn't do too much. Simply working on the time frame would cause Xorg to crash. In my case it didn't take long.
Comment 51 Chris Wilson 2014-08-22 07:05:41 UTC
Now consider that I have never used aegisub before...
Comment 52 Tyler Foo 2014-08-22 07:52:07 UTC
(In reply to comment #51)
> Now consider that I have never used aegisub before...

Ok did a screencast on how I use it. Here is the dropbox link: https://www.dropbox.com/s/3182zruiprbph41/screencast.mkv?m=

Hope it helps.
Comment 53 Chris Wilson 2014-09-08 07:20:43 UTC
I messed around a few times now, taking similar steps as you showed in the cast. Nothing so far. Could you try reproducing the crash with --enable-debug=full? It will be much slower, has a chance of hiding the bug, and will generate a huge logfile - but I don't have a better idea yet.
Comment 54 Tyler Foo 2014-09-08 10:43:14 UTC
(In reply to comment #53)
> I messed around a few times now, taking similar steps as you showed in the
> cast. Nothing so far. Could you try reproducing the crash with
> --enable-debug=full? It will be much slower, has a chance of hiding the bug,
> and will generate a huge logfile - but I don't have a better idea yet.

I think I got the file. But it's too big, and Dropbox is blocked again here in my country. Google Drive won't work either. And surprisingly neither does OneDive. Any suggestions?
Comment 55 Chris Wilson 2014-09-08 10:46:30 UTC
How big is it after xz? Or try:


head -1500 Xorg.0.log > xorg.trunc
tail -3000 Xorg.0.log >> xorg.trunc

and then compress it.
Comment 56 Tyler Foo 2014-09-08 10:52:55 UTC
Created attachment 105888 [details]
xorg debug log

Ok. Those commands are amazing. No need to compress anymore. Here it is.
Comment 57 Chris Wilson 2014-09-08 12:04:15 UTC
(In reply to comment #56)
> Created attachment 105888 [details]
> xorg debug log
> 
> Ok. Those commands are amazing. No need to compress anymore. Here it is.

That looks like a u16 underflow. Do you have the stderr available? Usually /var/log/gdm/:0.log or similar.

This should test the underflow theory:

commit 30932a7b9d255c2037bee19e01aa3edc37b07386
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 8 12:41:06 2014 +0100

    sna: Avoid u16 underflow when computing reserved batch space
    
    If we filled the batch exactly, then subtract -1 for the reserved
    BATCH_BUFFER_END, it would underflow to a large value - convincing us
    that we had sufficient room to stuff many, many more commands in.
    
    However, all the callsites should be guarded by checking already that
    they had sufficient space to emit at least one operation...
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=77074
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 58 Tyler Foo 2014-09-08 14:29:01 UTC
The commit just makes it worse. It crashes my X even when I'm not using Aegisub.
Comment 59 Chris Wilson 2014-09-08 15:18:26 UTC
Cool! Is it possible for you to get a stacktrace or updated debug log? Please?
Comment 60 Chris Wilson 2014-09-08 15:56:04 UTC
Sigh. It tool loading aegisub for me, but it then promptly crashed:

commit e0f7e9fc2f0b39b9e939ff48edea29950f125420
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 8 16:49:29 2014 +0100

    sna: Initialise and check for batch space
    
    commit 30932a7b9d255c2037bee19e01aa3edc37b07386
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Mon Sep 8 12:41:06 2014 +0100
    
        sna: Avoid u16 underflow when computing reserved batch space
    
    relied on gcc a little to much to warn me when I missed initialising 'rem'
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=77074
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


Now any crashes should be interesting!
Comment 61 Tyler Foo 2014-09-09 00:10:26 UTC
Created attachment 105935 [details]
xorg debug log with the latest git commit.

Just tested the lasted git commit. X still crashed when using Aegisub. Here is the log file with full debug.
Comment 62 Chris Wilson 2014-09-09 06:10:36 UTC
Hmm, but is it the same assertion? Can you please look for the login manager logs for the stderr from Xorg? (e.g. /var/log/gdm/:0.log)
Comment 63 Chris Wilson 2014-09-09 06:39:32 UTC
Also added a little more DBG messages before that assert:

commit faf0bdd477b9ec73f943c3101a3ae30fd6d579ea
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 9 07:36:40 2014 +0100

    sna: Add some DBG spam for BLT boxes
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=77074
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 64 Tyler Foo 2014-09-09 07:21:20 UTC
(In reply to comment #62)
> Hmm, but is it the same assertion? Can you please look for the login manager
> logs for the stderr from Xorg? (e.g. /var/log/gdm/:0.log)

I can't get the gdm logs, 'cause again it's not updated like what had with Xorg logs before. The reason that I could get the xorg logs was I disabled gdm and used startx to start my X, which is rootless this way, but not when using a display manager.
Comment 65 Chris Wilson 2014-09-09 07:26:15 UTC
On the VT you ran startx, and presumably switched back to after the crash, did you notice the line number of the assert (and what the assert was)?

I doubt it will be a eureka moment, but hopefully will narrow down where to look next.
Comment 66 Tyler Foo 2014-09-09 08:20:56 UTC
(In reply to comment #65)
> On the VT you ran startx, and presumably switched back to after the crash,
> did you notice the line number of the assert (and what the assert was)?
> 
> I doubt it will be a eureka moment, but hopefully will narrow down where to
> look next.

Been playing with Aegisub for half an hour, no crash so far.
Comment 67 Chris Wilson 2014-09-10 06:59:20 UTC
If you have a second computer (or smartphone and a ssh app), could you switch back to --enable-debug and run with gdb attached to Xorg via the remote connection? It is possible that the extra DBG is masking the bug, but catching the assertion would hopefully be enough information. Alternatively, we could try and find where systemd hides that information!
Comment 68 Tyler Foo 2014-09-10 07:39:39 UTC
Yeah, I have a smartphone, but you'll have to handhold me to finish the gdb stuff. But first I can check if the bug is still there by running Aegisub without --enable-debug, right?
Comment 69 Chris Wilson 2014-09-10 07:44:51 UTC
Yeah, hopefully the --enable-debug is just causing it to crash earlier and so help narrow down the root cause. Without --enable-debug, it should mostly work, or at least crash in the same location as before.
Comment 70 Tyler Foo 2014-09-10 08:15:00 UTC
(In reply to comment #69)
> Yeah, hopefully the --enable-debug is just causing it to crash earlier and
> so help narrow down the root cause. Without --enable-debug, it should mostly
> work, or at least crash in the same location as before.

yeah, it won't crash, and Aegisub feel kinda sluggish. I think I'm ready to try other options.
Comment 71 Chris Wilson 2014-09-10 08:23:39 UTC
If it feels slow, I would guess that X is using sw fallbacks (perhaps a GPU hang?). Anything in dmesg/Xorg.0.log? (If you can extract such from journald).
Comment 72 Tyler Foo 2014-09-10 08:32:55 UTC
Created attachment 106030 [details]
dmesg output

You can take a look at the dmesg output.
Comment 73 Chris Wilson 2014-09-10 08:39:55 UTC
Ok, not a GPU hang. Could I check the Xorg.0.log equiv?
Comment 74 Tyler Foo 2014-09-10 08:47:21 UTC
(In reply to comment #72)
> Created attachment 106030 [details]
> dmesg output
> 
> You can take a look at the dmesg output.

(In reply to comment #73)
> Ok, not a GPU hang. Could I check the Xorg.0.log equiv?

No Xorg.o.log since I enabled gdm, which means X is running as root. But I just figured out why Xorg.0.log is not updated when X is running as root. Gonna do a test. Will report back. Btw, can I just install the stable version of xf86-video-intel and test with --enable--debug=full? 'Cause I'm afraid it still won't crash with the latest git commit.
Comment 75 Chris Wilson 2014-09-10 08:53:48 UTC
(In reply to comment #74)
> Btw, can I just install the stable
> version of xf86-video-intel and test with --enable--debug=full? 'Cause I'm
> afraid it still won't crash with the latest git commit.

And find the assert? Yes. Just annoying that you then won't have the extra debugging I added for this bug. |
Comment 76 Tyler Foo 2014-09-10 09:35:23 UTC
Created attachment 106034 [details]
previous gdm logs outputted with command "journalctl -r /usr/bin/gdm"

Previous gdm logs outputted with command "journalctl -r /usr/bin/gdm". Take a look. Now I know how to get latest Xorg logs using journalctl. Just need to get X to crash now.
Comment 77 Chris Wilson 2014-09-10 09:42:09 UTC
Hmm, that didn't include the usual information I would expect from /var/log/gdm/:0.log. Perhaps try:

journalctl -r /usr/bin/gdm + /usr/bin/gnome-shell + /usr/bin/Xorg
Comment 78 Tyler Foo 2014-09-10 09:52:17 UTC
Created attachment 106040 [details]
xorg log for today

This is the xorg log for today. No log entries for gnome shell.
Comment 79 Chris Wilson 2014-09-10 10:02:31 UTC
Using DRI3+Present, it will feel sluggish. Nothing else stands out - I was looking for a message that it failed to submit some rendering and disabled acceleration, but that is absent.
Comment 80 Tyler Foo 2014-09-10 10:16:04 UTC
Created attachment 106046 [details]
journalctl -r -b -1 /usr/bin/Xorg.bin

Just had a crash. But still don't how to get the /var/log/gdm/:0.log file you need using journalctl.
Comment 81 Tyler Foo 2014-09-10 10:19:08 UTC
Created attachment 106047 [details]
journalctl -r -b -1 /usr/bin/gdm

Is this the assertion you are looking for?
Comment 82 Chris Wilson 2014-09-10 11:00:41 UTC
No. :(

If you are building the crashy version from git, could you do:

$ cd xf86-video-intel
$ git fetch
$ git cherry-pick 224af800f695b50ba5a65b5a2b9ca1e7a88d4e1a
$ make && sudo make install

That will (hopefully!) dump the assert to where journalctl /usr/bin/Xorg.bin will find it.
Comment 83 Tyler Foo 2014-09-10 11:06:32 UTC
(In reply to comment #82)
> No. :(
> 
> If you are building the crashy version from git, could you do:
> 
> $ cd xf86-video-intel
> $ git fetch
> $ git cherry-pick 224af800f695b50ba5a65b5a2b9ca1e7a88d4e1a
> $ make && sudo make install
> 
> That will (hopefully!) dump the assert to where journalctl /usr/bin/Xorg.bin
> will find it.

I just used the xf86-video-intel-git from arch aur. Can you help figure out how to add the git commands to the PKGBUILD file? Here is the link: https://aur.archlinux.org/packages/xf/xf86-video-intel-git/PKGBUILD
Comment 84 Chris Wilson 2014-09-10 11:10:33 UTC
Created attachment 106052 [details]
PKGBUILD

I think this should do, just adding git fetch && git cherry-pick before the actual build.
Comment 85 Tyler Foo 2014-09-10 11:15:09 UTC
(In reply to comment #84)
> Created attachment 106052 [details]
> PKGBUILD
> 
> I think this should do, just adding git fetch && git cherry-pick before the
> actual build.

And I also add --enable-debug=full, then wait for it to crash, right?
Comment 86 Chris Wilson 2014-09-10 11:19:05 UTC
Wait with fingers crossed...
Comment 87 Tyler Foo 2014-09-10 12:43:36 UTC
Created attachment 106055 [details]
Build error

Had this error when building.
Comment 88 Chris Wilson 2014-09-10 12:46:55 UTC
Hmm, it is building with the patch already included. Just scrub the two extra lines in PKGBUILD and press onwards.
Comment 89 Tyler Foo 2014-09-10 13:10:16 UTC
Created attachment 106058 [details]
Latest xorg log

X crashed and here is the log. Tell me if you find what you need.
Comment 90 Chris Wilson 2014-09-10 13:25:42 UTC
No. It is the right build, so it should be dumping the FatalError on asserts now. What's odd in the log file is:

Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_wakeup_handler
Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_accel_do_throttle -- no pending activity
Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: has_shadow: has pending damage? 0, outstanding flips: 0
Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_block_handler (tv=597.543000)
Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_wakeup_handler
Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_do_throttle (time=528700), triggered
Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_scanout_do_flush: flush timer active: delta=15
Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_block_handler (tv=-1.0)
Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_flush: flush?=0, dirty?=0
Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_flush: flush?=0, dirty?=0
Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_wakeup_handler: nbatch=0, need_retire=0, need_purge=0
Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_wakeup_handler

There is a 30 second period where one X disappears and new one starts up without that transition being logged at all. (The log begins with usual startup for Xorg-:0[541], but nothing at all is visible for Xorg-:1[1187])

ARGH.
Comment 91 Tyler Foo 2014-09-10 13:30:30 UTC
(In reply to comment #90)
> No. It is the right build, so it should be dumping the FatalError on asserts
> now. What's odd in the log file is:
> 
> Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_wakeup_handler
> Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_accel_do_throttle -- no
> pending activity
> Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: has_shadow: has pending
> damage? 0, outstanding flips: 0
> Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_block_handler
> (tv=597.543000)
> Sep 10 21:03:58 archins3437 gdm-Xorg-:1[1187]: sna_wakeup_handler
> Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_do_throttle
> (time=528700), triggered
> Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_scanout_do_flush: flush
> timer active: delta=15
> Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_block_handler (tv=-1.0)
> Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_flush: flush?=0,
> dirty?=0
> Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_flush: flush?=0,
> dirty?=0
> Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_accel_wakeup_handler:
> nbatch=0, need_retire=0, need_purge=0
> Sep 10 21:03:29 archins3437 gdm-Xorg-:0[541]: sna_wakeup_handler
> 
> There is a 30 second period where one X disappears and new one starts up
> without that transition being logged at all. (The log begins with usual
> startup for Xorg-:0[541], but nothing at all is visible for Xorg-:1[1187])
> 
> ARGH.

Maybe I don't use GDM to start my X and just use startx to start a rootless X and when the crash happens, the assert should be displayed on the VT? Will this do?
Comment 92 Chris Wilson 2014-09-10 13:34:27 UTC
(In reply to comment #91) 
> Maybe I don't use GDM to start my X and just use startx to start a rootless
> X and when the crash happens, the assert should be displayed on the VT? Will
> this do?

Yes. All I want (today!) is that fatal error message telling me which line to look at. If you use startx, it should be visible immediately after X crashes.
Comment 93 Tyler Foo 2014-09-10 13:36:48 UTC
(In reply to comment #92)
> (In reply to comment #91) 
> > Maybe I don't use GDM to start my X and just use startx to start a rootless
> > X and when the crash happens, the assert should be displayed on the VT? Will
> > this do?
> 
> Yes. All I want (today!) is that fatal error message telling me which line
> to look at. If you use startx, it should be visible immediately after X
> crashes.

OK. Fingers crossed.
Comment 94 Tyler Foo 2014-09-10 14:15:38 UTC
Created attachment 106061 [details]
Photos of the error message.

Here you go. Pls tell me it's what need. Otherwise you and me both are gonna be crazy.
Comment 95 Tyler Foo 2014-09-10 14:16:22 UTC
Created attachment 106062 [details]
left screen

Photos of the error message.
Comment 96 Tyler Foo 2014-09-10 14:16:55 UTC
Created attachment 106063 [details]
right screen

Photos of the error message.
Comment 97 Chris Wilson 2014-09-10 15:11:50 UTC
Erf. That's weird. However, look in ~/.local/share/xorg/Xorg.0.log*
Comment 98 Tyler Foo 2014-09-10 15:19:28 UTC
Created attachment 106064 [details]
Xorg.0.log.old with head -1500 and tail -3000

OK tell me this is it. I saw "sna_damage_add:48 assertion '!DAMAGE_IS_ALL(*damage)' failed"
Comment 99 Chris Wilson 2014-09-10 15:24:01 UTC
Yes! We have finally struck gold!
Comment 100 Tyler Foo 2014-09-10 15:27:11 UTC
(In reply to comment #99)
> Yes! We have finally struck gold!

Yay! Glad we got this son of ***. And I desperately need my sleep. Let me know if you need more debug info. Thanks.
Comment 101 Chris Wilson 2014-09-10 16:00:14 UTC
Scratches head.

I doubt that's the original bug, but here's the fix for that assert:

commit 9b25eeee85d32223841640c3a39901e4b63707ce
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Sep 10 16:37:16 2014 +0100

    sna: Do apply damage twice for miSpans.PolyFillRect
    
    As the caller will apply the damage afterwards, we do not need to do the
    accumulation in the miSpans callbacks and it presumes that its damage is
    unaltered.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=77074
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 102 Tyler Foo 2014-09-10 23:16:50 UTC
(In reply to comment #101)
> Scratches head.
> 
> I doubt that's the original bug, but here's the fix for that assert:
> 
> commit 9b25eeee85d32223841640c3a39901e4b63707ce
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Sep 10 16:37:16 2014 +0100
> 
>     sna: Do apply damage twice for miSpans.PolyFillRect
>     
>     As the caller will apply the damage afterwards, we do not need to do the
>     accumulation in the miSpans callbacks and it presumes that its damage is
>     unaltered.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=77074
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Thanks, will test it out. Btw, should I disable DRI3? 'Cause I read on Arch news that there are multiple rendering bugs, so it's disabled by default in the xf86-video-intel of their official repo.
Comment 103 Tyler Foo 2014-09-10 23:39:36 UTC
Created attachment 106093 [details]
Xorg log without --enable-debug=full

OK. It crashed again.
Comment 104 Chris Wilson 2014-09-11 06:06:59 UTC
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) Caught signal 11 (Segmentation fault). Server aborting
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: Fatal server error:
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE)
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) Segmentation fault at address 0x0
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE)
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 11: /usr/bin/Xorg.bin (0x400000+0x25d0e) [0x425d0e]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 10: /usr/lib/libc.so.6 (__libc_start_main+0xf0) [0x7f2dca9e0000]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 9: /usr/bin/Xorg.bin (0x400000+0x3b866) [0x43b866]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 8: /usr/bin/Xorg.bin (0x400000+0x376d7) [0x4376d7]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 7: /usr/bin/Xorg.bin (0x400000+0x33caa) [0x433caa]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 6: /usr/bin/Xorg.bin (0x400000+0x11c74a) [0x51c74a]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 5: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f2dc5c08000+0x4f08b) [0x7f2dc5c5708b]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 4: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f2dc5c08000+0x63c26) [0x7f2dc5c6bc26]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 3: /usr/lib/libpixman-1.so.0 (pixman_region_fini+0x9) [0x7f2dcb91b879]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 2: /usr/lib/libc.so.6 (0x7f2dca9c0000+0x33df0) [0x7f2dca9f3df0]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 1: /usr/bin/Xorg.bin (0x400000+0x197b69) [0x597b69]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) 0: /usr/bin/Xorg.bin (xorg_backtrace+0x56) [0x593966]
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE) Backtrace:
Sep 11 07:33:44 archins3437 gdm-Xorg-:0[520]: (EE)


Hmm, no --enable-debug at all? Could you run

addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x4f08b 0x63c26
addr2line -e /usr/bin/Xorg.bin -i 0x3b866 0x376d7 0x33caa 0x11c74a

and then rebuild with --enable-debug again.
Comment 105 Chris Wilson 2014-09-11 06:08:16 UTC
(In reply to comment #102)
> Thanks, will test it out. Btw, should I disable DRI3? 'Cause I read on Arch
> news that there are multiple rendering bugs, so it's disabled by default in
> the xf86-video-intel of their official repo.

It's disabled because it is incomplete and lacks true synchronisation with X (and compositors) hence resulting in delayed rendering (though it should be mostly correct rendering just at the wrong time).
Comment 106 Tyler Foo 2014-09-11 06:19:21 UTC
[tf@archins3437 ~]$ addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x4f08b 0x63c26
??:0
??:0
[tf@archins3437 ~]$ addr2line -e /usr/bin/Xorg.bin -i 0x3b866 0x376d7 0x33caa 0x11c74a
??:0
??:0
??:0
??:0

What's wrong? Will build with debug=full again.
Comment 107 Chris Wilson 2014-09-11 06:24:19 UTC
Oh I forgot, Arch completely strips all debug symbols from its installs (and doesn't install the separate dso.debug). There's probably a way to prevent that if you know Arch well enough, but let's hope it presents itself again with --enable-debug=full :|
Comment 108 Tyler Foo 2014-09-11 06:28:36 UTC
(In reply to comment #107)
> Oh I forgot, Arch completely strips all debug symbols from its installs (and
> doesn't install the separate dso.debug). There's probably a way to prevent
> that if you know Arch well enough, but let's hope it presents itself again
> with --enable-debug=full :|

I have putted "!strip" in the options with this build. Let's hope it crashes soon enough.
Comment 109 Tyler Foo 2014-09-11 06:57:10 UTC
Created attachment 106105 [details]
Xorg log without --enable-debug=full
Comment 110 Chris Wilson 2014-09-11 07:20:41 UTC
Sep 11 14:51:04 archins3437 gdm-Xorg-:0[1247]: (II) UnloadModule: "evdev"
Sep 11 14:51:04 archins3437 gdm-Xorg-:0[1247]: (II) evdev: Dell WMI hotkeys: Close
Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: _sna_blt_fill_boxes: ffffff x 1
Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: box_from_seg: seg=(1,0),(1,124); box=(1,0),(2,124)
Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: __kgem_bo_mark_dirty: handle=19 (proxy? 0)
Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: kgem_add_handle: handle=19, index=0
Sep 11 14:50:34 archins3437 gdm-Xorg-:0[504]: kgem_add_reloc: handle=19, pos=4, delta=0, domains=28002

You wouldn't happen to have the missing 30s? /o\
Comment 111 Chris Wilson 2014-09-11 07:25:21 UTC
Looking at the logfile it appears that journald only outputs a small amount of the log every 30s. I think we will just have to rely on --enable-debug (not --enable-debug=full).
Comment 112 Tyler Foo 2014-09-11 07:27:19 UTC
(In reply to comment #111)
> Looking at the logfile it appears that journald only outputs a small amount
> of the log every 30s. I think we will just have to rely on --enable-debug
> (not --enable-debug=full).

OK. Let's just say GDM and journald suck. Or should I try again with startx?
Comment 113 Chris Wilson 2014-09-11 07:29:33 UTC
If you have time to run with --enable-debug=full, please, please do so. :) In which case use startx whilst on a debugging run.
Comment 114 Tyler Foo 2014-09-11 08:11:35 UTC
Created attachment 106109 [details]
Xorg.0.log.old (2014-09-11T16-06) with head -1500 and tail -3000

I saw this: sna_damage_add_to_pixmap:75 assertion '!DAMAGE_IS_ALL(*damage)' failed
Comment 115 Chris Wilson 2014-09-11 08:40:11 UTC
Still the head scratcher from last night, just a second place along the same path that also modified the damage.

commit 797369449b87cbd578f9fb96f34b065e548755f6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Sep 11 09:37:11 2014 +0100

    sna: Do not mark the pixmap as cleared in the middle of a miSpans decomposition
    
    As the miSpans will continue to overdraw the Pixmap, it's final state
    will no longer be that clear value. We need to be much more careful when
    allowing that optimisation.
    
    Reported-by: Tyler Foo <tftylerfoo@gmail.com>
    References: https://bugs.freedesktop.org/show_bug.cgi?id=77074
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

That should finally put that to rest - it could be the cause of death, but it might not be...
Comment 116 Tyler Foo 2014-09-11 08:44:29 UTC
(In reply to comment #115)
> Still the head scratcher from last night, just a second place along the same
> path that also modified the damage.
> 
> commit 797369449b87cbd578f9fb96f34b065e548755f6
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Sep 11 09:37:11 2014 +0100
> 
>     sna: Do not mark the pixmap as cleared in the middle of a miSpans
> decomposition
>     
>     As the miSpans will continue to overdraw the Pixmap, it's final state
>     will no longer be that clear value. We need to be much more careful when
>     allowing that optimisation.
>     
>     Reported-by: Tyler Foo <tftylerfoo@gmail.com>
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=77074
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> That should finally put that to rest - it could be the cause of death, but
> it might not be...

OK. I'll test a little more when I have time.
Comment 117 Chris Wilson 2014-09-25 09:20:51 UTC
Any news?
Comment 118 Tyler Foo 2014-09-25 09:55:15 UTC
(In reply to comment #117)
> Any news?

I've used aegisub a couple times. Nothing so far.
Comment 119 Chris Wilson 2014-09-28 07:02:03 UTC
I am going to claim its fixed - that's bound to provoke it into failing again!
Comment 120 Tyler Foo 2014-10-24 23:36:12 UTC
Created attachment 108375 [details]
dmesg

Hey Chris, my computer had multiple freezes this month. Can you take look at the dmesg to see if it has something to do with this bug?

dmesg and journalctl output attached.
Comment 121 Tyler Foo 2014-10-24 23:37:08 UTC
Created attachment 108376 [details]
journalctl output
Comment 122 Tyler Foo 2014-10-24 23:38:54 UTC
If this has nothing to do with this bug, I'll file a separate bug instead.
Comment 123 Chris Wilson 2014-10-25 08:58:43 UTC
Yes, that is a separate (kernel) bug. We have had a few reports like that and we put a workaround into 3.17 to reduce the impact, but we don't know what's causing it yet.
Comment 124 Tyler Foo 2014-10-25 09:10:38 UTC
(In reply to Chris Wilson from comment #123)
> Yes, that is a separate (kernel) bug. We have had a few reports like that
> and we put a workaround into 3.17 to reduce the impact, but we don't know
> what's causing it yet.

Ok, this is great and bad. Good that it's not this bug again. Bad that we don't have a solution yet and it's really hard to reproduce. It just happens. So will downgrading the kernel help? Or should I downgrading the xf86-video-intel driver? I really need a working system these days.
Comment 125 Chris Wilson 2014-10-25 09:33:53 UTC
As far as we know, this bug was introduced in kernel 3.16.
Comment 126 Mikolaj 2019-01-10 06:21:45 UTC
I think this problem is fixed, as I encountered same issue on OpenBSD with GIMP. I had very easy repro case by just opening new file and navigating file browser in the recently open files. Crash each time of Xorg. More details reported here:

https://marc.info/?l=openbsd-bugs&m=154706833406795&w=2

GDB details from openbsd-bugs email:

(gdb) bt
#0  0x00000aeb3630ff3a in sna_blt_copy_boxes (sna=0xaeb33262000, alu=3 '\003', src_bo=0xaeb79f86400, src_dx=0, src_dy=0,
    dst_bo=0xaeb79f8a200, dst_dx=0, dst_dy=0, bpp=32, box=0xaeb63870000, nbox=0)
    at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_blt.c:3759
#1  0x00000aeb363544e9 in no_render_copy_boxes (sna=0xaeb33262000, alu=3 '\003', src=0xaeb7ab1b080, src_bo=0xaeb79f86400, src_dx=0,
    src_dy=0, dst=0xaeb7ab1b080, dst_bo=0xaeb79f8a200, dst_dx=0, dst_dy=0, box=0xaeb63868010, n=2038, flags=0)
    at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_render.c:137
#2  0x00000aeb362d2907 in sna_pixmap_move_to_gpu (pixmap=0xaeb7ab1b080, flags=10)
    at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_accel.c:4246
#3  0x00000aeb362f375a in sna_copy_boxes (src=0xaeb7ab1b080, dst=0xaeb1507e400, gc=0xaeacb235a00, region=0x7f7ffffe9750, dx=-616,
    dy=-72, bitplane=0, closure=0x0) at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_accel.c:6387
#4  0x00000aeb362f5122 in sna_do_copy (src=0xaeb7ab1b080, dst=0xaeb1507e400, gc=0xaeacb235a00, sx=0, sy=0, width=1535, height=1012,
    dx=616, dy=72, copy=0xaeb362f2f00 <sna_copy_boxes>, bitPlane=0, closure=0x0)
    at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_accel.c:6959
#5  0x00000aeb362dd3c7 in sna_copy_area (src=0xaeb7ab1b080, dst=0xaeb1507e400, gc=0xaeacb235a00, src_x=0, src_y=0, width=1535,
    height=1012, dst_x=245, dst_y=71) at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_accel.c:7041
#6  0x00000ae8a1bdd17d in damageCopyArea (pSrc=0xaeb7ab1b080, pDst=0xaeb1507e400, pGC=0xaeacb235a00, srcx=0, srcy=0, width=1535,
    height=1012, dstx=245, dsty=71) at /home/mkucharski/openbsd/xenocara/xserver/miext/damage/damage.c:775
#7  0x00000ae8a1a4728a in ProcCopyArea (client=0xaeb6c1f3800) at /home/mkucharski/openbsd/xenocara/xserver/dix/dispatch.c:1722
#8  0x00000ae8a1a41df0 in Dispatch () at /home/mkucharski/openbsd/xenocara/xserver/dix/dispatch.c:480
#9  0x00000ae8a1a55479 in dix_main (argc=7, argv=0x7f7ffffe9b18, envp=0x7f7ffffe9b58)
    at /home/mkucharski/openbsd/xenocara/xserver/dix/main.c:287
#10 0x00000ae8a1a2e357 in main (argc=7, argv=0x7f7ffffe9b18, envp=0x7f7ffffe9b58)
    at /home/mkucharski/openbsd/xenocara/xserver/dix/stubmain.c:34
(gdb) list /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_blt.c:3759
3754
3755                                            assert(box->x1 >= 0);
3756                                            assert(box->y1 >= 0);
3757
3758                                            *(uint64_t *)&b[0] = hdr;
3759                                            *(uint64_t *)&b[2] = *(const uint64_t *)box;
3760                                            *(uint64_t *)(b+4) =
3761                                                    kgem_add_reloc64(kgem, kgem->nbatch + 4, dst_bo,
3762                                                                     I915_GEM_DOMAIN_RENDER << 16 |
3763                                                                     I915_GEM_DOMAIN_RENDER |

...

(gdb) print box
$2 = (const BoxRec *) 0xaeb63870000
(gdb) print *(const uint64_t *)box
Cannot access memory at address 0xaeb63870000

...

(gdb) print *(const uint64_t *) 0xaeb63870000
Cannot access memory at address 0xaeb63870000
(gdb) print *(const uint64_t *) 0xaeb63868010
$5 = 568481871298560

What I see in above backtrace, inside sna_blt_copy_boxes() box=0xaeb63870000, however
in no_render_copy_boxes() box=0xaeb63868010 and that results Xorg crash when accessing
box variable.

(gdb) bt
#0  0x00000aeb3630ff3a in sna_blt_copy_boxes (sna=0xaeb33262000, alu=3 '\003', src_bo=0xaeb79f86400, src_dx=0, src_dy=0,
    dst_bo=0xaeb79f8a200, dst_dx=0, dst_dy=0, bpp=32, box=0xaeb63870000, nbox=0)
                                                      ^^^^^^^^^^^^^^^^^
    at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_blt.c:3759
#1  0x00000aeb363544e9 in no_render_copy_boxes (sna=0xaeb33262000, alu=3 '\003', src=0xaeb7ab1b080, src_bo=0xaeb79f86400, src_dx=0,
    src_dy=0, dst=0xaeb7ab1b080, dst_bo=0xaeb79f8a200, dst_dx=0, dst_dy=0, box=0xaeb63868010, n=2038, flags=0)
                                                                           ^^^^^^^^^^^^^^^^^
    at /home/mkucharski/openbsd/xenocara/driver/xf86-video-intel/src/sna/sna_render.c:137
...


Yesterday I've compiled e5ff8e1828f97891c819c919d7115c6e18b2eb1f from https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel.git and only problem on the way was bugzilla id 109268 (byteswap.h not available on OpenBSD) and the crash is gone with latest code of xf86-video-intel the driver.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.