Bug 57118 - X stops sending DAMAGE
Summary: X stops sending DAMAGE
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: 7.7 (2012.06)
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-14 15:15 UTC by Tassilo Horn
Modified: 2013-02-21 17:59 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg log (38.66 KB, text/plain)
2012-11-14 15:15 UTC, Tassilo Horn
no flags Details
Backtrace from: gdb /usr/bin/X `pgrep X` (11.59 KB, text/plain)
2012-11-14 15:16 UTC, Tassilo Horn
no flags Details
Another backtrace from git (4.86 KB, text/plain)
2012-11-29 13:10 UTC, Tassilo Horn
no flags Details
The dmesg output after the freeze (86.11 KB, text/plain)
2012-11-29 13:12 UTC, Tassilo Horn
no flags Details
Backtraces of X/gnome-shell + x11trace information (328.35 KB, application/x-bzip)
2012-12-03 08:41 UTC, Tassilo Horn
no flags Details
More backtraces of X/gnome-shell + x11trace information (1.34 MB, application/x-bzip)
2012-12-03 08:41 UTC, Tassilo Horn
no flags Details

Description Tassilo Horn 2012-11-14 15:15:17 UTC
Created attachment 70072 [details]
Xorg log

Since recently, my X server (1.13.0) frequently locks up.  The symptoms are that the screen stays as is, I can move the mouse, but I can't select a window, type anything, nor switch to a console using Ctrl-Alt-Fx.

So far, lockups only occured when I had a second monitor attached to my notebook.  When I do Ctrl-Alt-Fx then to try to switch to a console, the laptop LCD turns black and the second monitors keeps showing the desktop as before, but the mouse pointer is gone.

Sometimes, but not always, I'm able to SSH into the machine and reboot it.  If not, I have to halt it using Magic SysRQ keys.

I think it started when the intel driver was updated from 2.20.9 to 2.20.10.  Since then, I had the same lockups with the newer intel drivers 2.20.12 and 2.20.13.  I'm going to downgrade to 2.20.9 again, which seems not to trigger the lockup.  (If it also does, I'll report back.  It's possibly that this bug has nothing to do with the intel driver...)

Graphics card information from lshw:

     *-pci
          description: Host bridge
          product: Mobile PM965/GM965/GL960 Memory Controller Hub
          vendor: Intel Corporation
          physical id: 100
          bus info: pci@0000:00:00.0
          version: 0c
          width: 32 bits
          clock: 33MHz
          configuration: driver=agpgart-intel
          resources: irq:0
        *-display:0
             description: VGA compatible controller
             product: Mobile GM965/GL960 Integrated Graphics Controller (primary)
             vendor: Intel Corporation
             physical id: 2
             bus info: pci@0000:00:02.0
             version: 0c
             width: 64 bits
             clock: 33MHz
             capabilities: msi pm vga_controller bus_master cap_list rom
             configuration: driver=i915 latency=0
             resources: irq:45 memory:f8100000-f81fffff memory:e0000000-efffffff ioport:1800(size=8)
        *-display:1 UNCLAIMED
             description: Display controller
             product: Mobile GM965/GL960 Integrated Graphics Controller (secondary)
             vendor: Intel Corporation
             physical id: 2.1
             bus info: pci@0000:00:02.1
             version: 0c
             width: 64 bits
             clock: 33MHz
             capabilities: pm bus_master cap_list
             configuration: latency=0
             resources: memory:f8200000-f82fffff

The Xorg.0.log.old file doesn't contain any useful information wrt the lockup.  I'll attach it anyway.

I'll also attach a backtrace I gathered using GDB by SSHing into the locked up machine.
Comment 1 Tassilo Horn 2012-11-14 15:16:50 UTC
Created attachment 70073 [details]
Backtrace from: gdb /usr/bin/X `pgrep X`
Comment 2 Chris Wilson 2012-11-14 15:18:28 UTC
What type of desktop environment do you have? Specially are you using an OpenGL compositor?
Comment 3 Tassilo Horn 2012-11-14 15:43:47 UTC
(In reply to comment #2)
> What type of desktop environment do you have? Specially are you using an
> OpenGL compositor?

I use GNOME3 (not the fallback mode), so yes, I'll probably use a OpenGL compositor.

What I forgot to mention:  I use the linux kernel version 3.6.6, and the intel driver is configured with these options:

  configure --prefix=/usr --build=x86_64-pc-linux-gnu \
    --host=x86_64-pc-linux-gnu --mandir=/usr/share/man \
    --infodir=/usr/share/info --datadir=/usr/share \
    --sysconfdir=/etc --localstatedir=/var/lib \
    --libdir=/usr/lib64 --disable-dependency-tracking \
    --docdir=/usr/share/doc/xf86-video-intel-2.20.13 \
    --enable-dri --enable-glamor --enable-sna --disable-uxa \
    --enable-udev --disable-xvmc
Comment 4 Chris Wilson 2012-11-21 10:16:27 UTC
Ok, I think this should be fixed by

commit 9ab1d1f94e502e5fde87e7c171f3502f8a55f22b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 20 18:42:58 2012 +0000

    sna/dri: Queue a vblank-continuation after flip-completion
    
    If a vblank request was delayed due to a pending flip, we need to make
    sure that we then queue it after that flip or else progress ceases.
    
    Reported-by: Jiri Slaby <jirislaby@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56423
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57156
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Please test and reopen if it is not.
Comment 5 Tassilo Horn 2012-11-29 07:22:20 UTC
> Ok, I think this should be fixed by
> 
> commit 9ab1d1f94e502e5fde87e7c171f3502f8a55f22b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Nov 20 18:42:58 2012 +0000
> 
>     sna/dri: Queue a vblank-continuation after flip-completion
>     
>     If a vblank request was delayed due to a pending flip, we need to make
>     sure that we then queue it after that flip or else progress ceases.
>     
>     Reported-by: Jiri Slaby <jirislaby@gmail.com>
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56423
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57156
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Please test and reopen if it is not.

Is this commit in the intel driver version 2.20.14?  If so, then the problem is not fixed.

Yesterday I updated to version 2.20.14.  Now at work with 2 outputs attached, it took only 20 minutes until I had the next freeze.  Sadly, I couldn't even ssh into the machine to get a backtrace, so I cannot say for sure it was the identical problem.  But at least the symptoms were the same.
Comment 6 Tassilo Horn 2012-11-29 07:37:12 UTC
I've installed the git version now to be able to quickly test-drive possible fixes.
Comment 7 Chris Wilson 2012-11-29 08:53:16 UTC
If the machine was no longer accessible over the network, you have a much more serious problem. The first issue you reported was that your compositing manager stopped drawing, a symptom that is likely the result of the patch I suggested. This issue is however a system freeze, a kernel bug. Look for anything unusual in dmesg.
Comment 8 Tassilo Horn 2012-11-29 09:54:55 UTC
(In reply to comment #7)
> If the machine was no longer accessible over the network, you have a much
> more serious problem. The first issue you reported was that your compositing
> manager stopped drawing, a symptom that is likely the result of the patch I
> suggested.

Well, I could still move my mouse, and the mouse pointer even changed its shape over window boarders.  Not sure if that falls into the responsibility of the composition manager...

> This issue is however a system freeze, a kernel bug.

Hm, ok.  But in any way, I get the same symptoms since 2.20.10.  I'd say in about 50% of the cases I could still ssh to the machine and get a backtrace like the attached one, and in the other 50% I couldn't connect to it.

> Look for anything unusual in dmesg.

The dmesg log starts at boot time.  How can I get the previous dmesg log after rebooting the machine?

But here's the journald log for the specific period around the latest X freeze.  The freeze occured somewhere at 08:07.  Immediately I went into another room, logged into another machine and tried to ssh into my machine.  You can see the failing ssh logins because of some pam issue.  At 08:10:53 I was back in my room and rebooted using Magic SysRQ keys.

Nov 29 08:07:01 thinkpad /USR/SBIN/CROND[3558]: (horn) CMD (cd ~/GoogleDrive/ && ping -c 1 www.google.de > /dev/null 2>&1 && grive > 
/dev/null 2>&1)
Nov 29 08:07:11 thinkpad CROND[3557]: pam_unix(crond:session): session closed for user horn
Nov 29 08:07:15 thinkpad kernel: usb 1-4.6.1: unlink qh8-0e01/ffff880131cefd00 start 5 [1/2 us]
Nov 29 08:08:01 thinkpad crond[3594]: pam_unix(crond:session): session opened for user horn by (uid=0)
Nov 29 08:08:26 thinkpad crond[3594]: pam_systemd(crond:session): Failed to create session: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Nov 29 08:08:26 thinkpad /USR/SBIN/CROND[3600]: (horn) CMD (cd ~/GoogleDrive/ && ping -c 1 www.google.de > /dev/null 2>&1 && grive > /dev/null 2>&1)
Nov 29 08:09:01 thinkpad crond[3608]: pam_unix(crond:session): session opened for user horn by (uid=0)
Nov 29 08:09:09 thinkpad dhcpcd[428]: wlan0: renewing lease of 141.26.93.134
Nov 29 08:09:11 thinkpad systemd[1]: Starting OpenSSH per-connection server daemon...
Nov 29 08:09:11 thinkpad systemd[1]: Started OpenSSH per-connection server daemon.
Nov 29 08:09:12 thinkpad sshd[3611]: SSH: Server;Ltype: Kex;Remote: 141.26.69.157-41931;Enc: aes128-ctr;MAC: hmac-md5;Comp: none [preauth]
Nov 29 08:09:14 thinkpad sshd[3611]: SSH: Server;Ltype: Authname;Remote: 141.26.69.157-41931;Name: horn [preauth]
Nov 29 08:09:19 thinkpad sshd[3611]: Accepted keyboard-interactive/pam for horn from 141.26.69.157 port 41931 ssh2
Nov 29 08:09:19 thinkpad sshd[3611]: pam_unix(sshd:session): session opened for user horn by (uid=0)
Nov 29 08:09:26 thinkpad crond[3608]: pam_systemd(crond:session): Failed to create session: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Nov 29 08:09:26 thinkpad /USR/SBIN/CROND[3619]: (horn) CMD (cd ~/GoogleDrive/ && ping -c 1 www.google.de > /dev/null 2>&1 && grive > /dev/null 2>&1)
Nov 29 08:09:44 thinkpad sshd[3611]: pam_systemd(sshd:session): Failed to create session: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Nov 29 08:09:59 thinkpad console-kit-daemon[259]: console-kit-daemon[259]: WARNING: The program /usr/lib/ConsoleKit/run-session.d/pam-foreground-compat.ck didn't exit within 15 seconds; killing it
Nov 29 08:09:59 thinkpad console-kit-daemon[259]: WARNING: The program /usr/lib/ConsoleKit/run-session.d/pam-foreground-compat.ck didn't exit within 15 seconds; killing it
Nov 29 08:10:01 thinkpad crond[3630]: pam_unix(crond:session): session opened for user horn by (uid=0)
Nov 29 08:10:01 thinkpad crond[3629]: pam_unix(crond:session): session opened for user root by (uid=0)
Nov 29 08:10:24 thinkpad sshd[3611]: pam_systemd(sshd:session): Failed to create session: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Nov 29 08:10:24 thinkpad sshd[3611]: Received disconnect from 141.26.69.157: 11: disconnected by user
Nov 29 08:10:24 thinkpad sshd[3611]: pam_unix(sshd:session): session closed for user horn
Nov 29 08:10:26 thinkpad crond[3630]: pam_systemd(crond:session): Failed to create session: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Nov 29 08:10:26 thinkpad crond[3629]: pam_systemd(crond:session): Failed to create session: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Nov 29 08:10:26 thinkpad /USR/SBIN/CROND[3646]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Nov 29 08:10:26 thinkpad /USR/SBIN/CROND[3645]: (horn) CMD (cd ~/GoogleDrive/ && ping -c 1 www.google.de > /dev/null 2>&1 && grive > /dev/null 2>&1)
Nov 29 08:10:39 thinkpad console-kit-daemon[259]: console-kit-daemon[259]: WARNING: The program /usr/lib/ConsoleKit/run-session.d/pam-foreground-compat.ck didn't exit within 15 seconds; killing it
Nov 29 08:10:39 thinkpad console-kit-daemon[259]: WARNING: The program /usr/lib/ConsoleKit/run-session.d/pam-foreground-compat.ck didn't exit within 15 seconds; killing it
Nov 29 08:10:53 thinkpad systemd-journal[3667]: Allowing system journal files to grow to 4.0G.
Nov 29 08:10:53 thinkpad kernel: ACPI: Invalid Power Resource to register!
Nov 29 08:10:53 thinkpad kernel: SysRq : 
Nov 29 08:10:53 thinkpad kernel: Emergency Sync
Nov 29 08:10:53 thinkpad kernel: Emergency Sync complete
Nov 29 08:10:53 thinkpad kernel: SysRq : 
Nov 29 08:10:53 thinkpad kernel: Emergency Sync
Nov 29 08:10:53 thinkpad kernel: Emergency Sync complete
Nov 29 08:10:53 thinkpad kernel: SysRq : 
Nov 29 08:10:53 thinkpad kernel: Kill All Tasks
Nov 29 08:10:53 thinkpad systemd[1]: Unit systemd-journald.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: cups.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit cups.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: ntp.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit ntp.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: cronie.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: colord.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit colord.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: accounts-daemon.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit accounts-daemon.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: wpa_supplicant.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit wpa_supplicant.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: avahi-daemon.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit avahi-daemon.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: NetworkManager.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit NetworkManager.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: lightdm.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: dbus.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit dbus.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: polkit.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit polkit.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: upower.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit upower.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: udisks2.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit udisks2.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: systemd-udevd.service holdoff time over, scheduling restart.
Nov 29 08:10:53 thinkpad systemd[1]: Stopping udev Kernel Device Manager...
Nov 29 08:10:53 thinkpad systemd[1]: Starting udev Kernel Device Manager...
Nov 29 08:10:53 thinkpad systemd[1]: systemd-journald.service holdoff time over, scheduling restart.
Nov 29 08:10:53 thinkpad systemd[1]: Stopping Journal Service...
Nov 29 08:10:53 thinkpad systemd[1]: Starting Journal Service...
Nov 29 08:10:53 thinkpad systemd[1]: Started Journal Service.
Nov 29 08:10:53 thinkpad systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Nov 29 08:10:53 thinkpad systemd-udevd[3666]: starting version 195
Nov 29 08:10:53 thinkpad systemd-journal[3667]: Journal started
Nov 29 08:10:53 thinkpad systemd[1]: systemd-udevd.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Unit systemd-udevd.service entered failed state
Nov 29 08:10:53 thinkpad systemd[1]: systemd-journald.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:53 thinkpad systemd[1]: Started udev Kernel Device Manager.
Nov 29 08:10:53 thinkpad systemd-udevd[3666]: IMPORT{builtin}: 'uaccess' unknown /usr/lib64/udev/rules.d/73-seat-late.rules:15
Nov 29 08:10:53 thinkpad systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
Nov 29 08:10:54 thinkpad systemd-journal[3673]: Allowing system journal files to grow to 4.0G.
Nov 29 08:10:54 thinkpad systemd-journald[3667]: Received SIGUSR1
Nov 29 08:10:54 thinkpad kernel: SysRq : 
Nov 29 08:10:54 thinkpad kernel: Kill All Tasks
Nov 29 08:10:54 thinkpad systemd[1]: Unit systemd-journald.service entered failed state
Nov 29 08:10:54 thinkpad systemd[1]: systemd-udevd.service holdoff time over, scheduling restart.
Nov 29 08:10:54 thinkpad systemd[1]: Stopping udev Kernel Device Manager...
Nov 29 08:10:54 thinkpad systemd[1]: Starting udev Kernel Device Manager...
Nov 29 08:10:54 thinkpad systemd[1]: systemd-journald.service holdoff time over, scheduling restart.
Nov 29 08:10:54 thinkpad systemd[1]: Stopping Journal Service...
Nov 29 08:10:54 thinkpad systemd[1]: Starting Journal Service...
Nov 29 08:10:54 thinkpad systemd-udevd[3672]: starting version 195
Nov 29 08:10:54 thinkpad systemd[1]: Started Journal Service.
Nov 29 08:10:54 thinkpad systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
Nov 29 08:10:54 thinkpad systemd-journal[3673]: Journal started
Nov 29 08:10:54 thinkpad systemd[1]: systemd-udevd.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:54 thinkpad systemd[1]: Unit systemd-udevd.service entered failed state
Nov 29 08:10:54 thinkpad systemd[1]: systemd-journald.service: main process exited, code=killed, status=9/KILL
Nov 29 08:10:54 thinkpad systemd[1]: Started udev Kernel Device Manager.
Nov 29 08:10:54 thinkpad systemd-udevd[3672]: IMPORT{builtin}: 'uaccess' unknown /usr/lib64/udev/rules.d/73-seat-late.rules:15
Nov 29 08:10:54 thinkpad systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
-- Reboot --
Comment 9 Tassilo Horn 2012-11-29 13:10:37 UTC
Created attachment 70788 [details]
Another backtrace from git

I had another freeze with the intel driver from the current git HEAD.  It looks identical to the previous one, so it seems the patch didn't fix the issue.
Comment 10 Tassilo Horn 2012-11-29 13:12:08 UTC
Created attachment 70789 [details]
The dmesg output after the freeze

When I ssh-ed into the machine with the freeze, I also captured the dmesg output.  Doesn't look like a kernel oops, right?
Comment 11 Chris Wilson 2012-11-29 13:19:24 UTC
It is your compositing manager that is not rendering, and the process you should be xtracing and inspecting.
Comment 12 Tassilo Horn 2012-12-03 08:40:24 UTC
(In reply to comment #11)
> It is your compositing manager that is not rendering, and the process you
> should be xtracing and inspecting.

I've done that now, and I'm going to attach two tarballs containing the information from my latest two freezes.  I was running with the intel drivers from git, commit 37eb7343be1aeeb90a860096756603a577df1a77.

Every tarball contains

  1. a backtrace of X (which is the same than the previous backtraces attached to this issue)
  2. a backtrace of gnome-shell (the compositing manager of GNOME)
  3. a x11trace.txt.N file that contains the information from the command

         DISPLAY=:0 x11trace -o ~/x11trace.txt -- /usr/bin/gnome-shell

     from the invocation of gnome-shell until several minutes after the freeze.  (x11trace is the xtrace utility.  My distro renamed it because glibc also has an xtrace program.)

After the freezes occured, I ssh-ed into the machine and made several backtraces of X and gnome-shell, i.e., I did one backtrace, waited a minute, then did another one.  I used a script that attaches GDB to the process, writes a backtrace to a file, and then detaches again and lets the process continue.

Some observations:

  - After a freeze, the backtraces of X and gnome-shell don't change anymore.  Diffing a backtrace with a backtrace I did a minute later doesn't show any difference.  Therefore, I've put only one into each of the tarballs.

  - After a freeze, there's still data written to the x11trace.txt file.

  - When I try to shutdown or reboot the machine after a freeze by issuing "systemctl reboot|poweroff" while logged in via ssh, the X windows disappear, but then the system is completely locked and won't poweroff/reboot.

For the time being, I've switched back to some pre-2.20.10 intel driver.  But I can easily switch back to the git version if you need additional information.
Comment 13 Tassilo Horn 2012-12-03 08:41:25 UTC
Created attachment 70958 [details]
Backtraces of X/gnome-shell + x11trace information
Comment 14 Tassilo Horn 2012-12-03 08:41:54 UTC
Created attachment 70959 [details]
More backtraces of X/gnome-shell + x11trace information
Comment 15 Chris Wilson 2012-12-03 09:08:06 UTC
Neither DDX nor gnome-shell, the X server stops sending DAMAGE.
Comment 16 Chris Wilson 2012-12-03 09:09:44 UTC
And I just spotted that g-s requested no further damage notifications.
Comment 17 Tassilo Horn 2012-12-03 10:05:55 UTC
(In reply to comment #16)
> And I just spotted that g-s requested no further damage notifications.

Thanks Chris for your assistance.

I've created a gnome-shell bug report at https://bugzilla.gnome.org/show_bug.cgi?id=689519.
Comment 18 Tassilo Horn 2013-01-02 11:30:02 UTC
The GNOME dev Jasper St. Pierre said that you, Chris, confirmed that it's not a bug in the gnome-shell's damage behavior, so I'm reopening this ticket:

  https://bugzilla.gnome.org/show_bug.cgi?id=689519#c8

In the meantime, I've upgraded to

  x11-base/xorg-server (1.13.1@27.12.2012)
  x11-drivers/xf86-video-intel (2.20.16@27.12.2012)
  media-libs/mesa (9.0.1@30.12.2012)
  x11-libs/libdrm (2.4.40@13.12.2012)

and still have those freezes.

However, I'm pretty certain that it has something to do with dual-screen setups.  Today, when I had my first day at work after 1.5 weeks, I had a freeze within 10 minutes after startup.  After a reboot, I disabled the notebook LCD and now only use the external monitor.  Since then, I didn't have another freeze.  At home, I don't use a second monitor, and there I've never had a freeze so far.

So disabling the dual-screen setup seems to be a workaround for avoiding the freezes, although it's not really feasible for me because I frequently have to use beamers to present my display to a student audience.
Comment 19 Chris Wilson 2013-01-02 11:40:30 UTC
Repeat with a 3.7 kernel in case it is one of the known pageflipping bugs, but given that the recent issue was that X stops sending damage events as opposed to g-s hanging, I doubt it.
Comment 20 Tassilo Horn 2013-01-02 11:44:40 UTC
(In reply to comment #19)
> Repeat with a 3.7 kernel in case it is one of the known pageflipping bugs,
> but given that the recent issue was that X stops sending damage events as
> opposed to g-s hanging, I doubt it.

Sorry, I forgot to mention that I already use a 3.7.1 kernel.
Comment 21 Jasper St. Pierre 2013-01-02 11:46:46 UTC
I'm quite sure that the freeze described in this bug is unrelated to DAMAGE, considering that gnome-shell, a GL direct rendering client, seems to be frozen.
Comment 22 Chris Wilson 2013-02-20 14:55:19 UTC
Ah gen4, can you please try testing with drm-intel-next,

commit 21ad833075801a7cd81b5ef1604ffc6c600e5ff9
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Tue Feb 19 15:16:39 2013 +0200

    drm/i915: Fix races in gen4 page flip interrupt handling

I think that neatly explains this bug.
Comment 23 Tassilo Horn 2013-02-20 18:27:21 UTC
(In reply to comment #22)
> Ah gen4, can you please try testing with drm-intel-next,

Um, what's drm-intel-next?  It doesn't seem to be a branch in xf86-video-intel or mesa...

BTW, since a month ago I've switched back to recent intel drivers again without occuring a hang anymore.  However, in that time frame I also didn't use a secondary monitor anymore, and as said, those hangs only occured when a second monitor was attached and activated.
Comment 24 Chris Wilson 2013-02-20 18:57:31 UTC
It's a branch of http://cgit.freedesktop.org/~danvet/drm-intel where we land features and fixes before sending upstream.
Comment 25 Tassilo Horn 2013-02-20 20:01:40 UTC
> It's a branch of http://cgit.freedesktop.org/~danvet/drm-intel where we land
> features and fixes before sending upstream.

Ah, it's in the kernel. :-)

Ok, I'm compiling it right now and I'll test it using a dual-monitor setup tomorrow.  That used to trigger the freeze pretty reliably last time.
Comment 26 Tassilo Horn 2013-02-21 17:59:12 UTC
Hey Chris, I've used the intel drivers from git and thedrm-intel-next kernel branch the whole day with a dual-monitor setup, and I haven't occured any freeze.  When I've reported the bug, the freeze usually occured within 10-20 minutes.  So it seems to be fixed.  Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.