Bug 79009 - [g43] Missed irq leading to frozen display (due to vblank counter tracking)
Summary: [g43] Missed irq leading to frozen display (due to vblank counter tracking)
Status: CLOSED INVALID
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ville Syrjala
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-21 09:50 UTC by Bob Ham
Modified: 2017-07-24 22:54 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Kernel log of two repeated hangs, the last part of the kernel log before the first hang and the entirety of the log before the second hang (96.66 KB, text/plain)
2014-05-21 09:50 UTC, Bob Ham
no flags Details
X server log (40.21 KB, text/plain)
2014-05-21 09:50 UTC, Bob Ham
no flags Details
Kernel log from boot to first hang, including complete stack trace from Alt-SysRq-t (959.71 KB, text/plain)
2014-06-14 13:26 UTC, Bob Ham
no flags Details
Kernel log after second hang, of complete stack trace from Alt-SysRq-t (697.39 KB, text/plain)
2014-06-14 13:27 UTC, Bob Ham
no flags Details
Kernel log of stack trace from Alt-SysRq-t after first hang but before hang notification (901.68 KB, text/plain)
2014-06-22 20:54 UTC, Bob Ham
no flags Details
Kernel log of stack trace from Alt-SysRq-t after second hang after hang notification (701.84 KB, text/plain)
2014-06-22 20:55 UTC, Bob Ham
no flags Details
GDB backtrace of gnome-shell after the kernel's hangcheck notification (3.48 KB, text/plain)
2014-06-24 20:43 UTC, Bob Ham
no flags Details

Description Bob Ham 2014-05-21 09:50:06 UTC
Created attachment 99491 [details]
Kernel log of two repeated hangs, the last part of the kernel log before the first hang and the entirety of the log before the second hang

Using vanilla Linux 3.14, X server 1.12.4 and Intel driver 2.21.15 and GNOME 3 on a G43 chipset, I get random lockups of the X desktop.  The mouse cursor moves but the rest of X is dead; nothing responds.  I can switch to the console (in order to reboot the computer) but if I switch back to X then the machine will hard lock within a short period.  After switching to the console, after a short delay the kernel will report:

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle

This definitely happens a lot while composing emails in Evolution (and is therefore EXTREMELY ANNOYING :-).  I get the feeling it may happen some time after having dragged a window to a side edge of the screen to half-maximise it.

The lockups seem to happen in clumps.  That is, for weeks the machine will be stable with no problems but then there will be one lockup followed by another lockup, sometimes within minutes of rebooting, and sometimes repeating a number of times, eventually ending by my giving up using the computer.
Comment 1 Bob Ham 2014-05-21 09:50:43 UTC
Created attachment 99492 [details]
X server log
Comment 2 Chris Wilson 2014-05-21 10:04:06 UTC
Hard lock? Not even ssh'able? It would be really useful to get the kernel and userspace stack traces of the stuck processes.

After a missed irq, we start busy-waiting for the GPU rather than rely on any more interrupts being received. Obviously that should not lead to a lock up.
Comment 3 Bob Ham 2014-05-21 10:12:15 UTC
(In reply to comment #2)
> Hard lock?

It hard locks if I (1) switch from X to the console and then (2) switch from the console back to X.  If I stay on the console there isn't a problem.

> Not even ssh'able?

Not even ssh'able.

> It would be really useful to get the kernel
> and userspace stack traces of the stuck processes.

What do you mean?  I'm not sure which processes you're referring to.
Comment 4 Chris Wilson 2014-06-06 08:38:30 UTC
(In reply to comment #3)
> > It would be really useful to get the kernel
> > and userspace stack traces of the stuck processes.
> 
> What do you mean?  I'm not sure which processes you're referring to.

All of them. sys-rq-t (echo t > /proc/sysrq-trigger)
Comment 5 Bob Ham 2014-06-06 09:31:06 UTC
I realise now that the behaviour I reported previously was mis-remembered.  It is not the case that the machine locks hard after X hangs and I switch to the console and back.  I tried to do this recently and no hard-lock occurred.  Instead, it was the case that the machine would lock hard if X hanged, I switched to the console and then *restarted* X.


Recently, there was a serendipitous event which may shed light on the problem.  There was a hang while two user sessions were running.  That is, one user had been logged in and another user came along, selected "New Login" from the xscreensaver password dialog and then logged in to another GNOME 3 session themselves.  In the process list, there were two processes named X, running on different VTs.

In this state, there was the usual hang.  However, I managed to switch between the different X sessions.  After the hang, I switched from the active X session to the console and then switched again to the other X session.  After switching to the other X session, the display was still active but only for a few moments.  That is, IIRC, some animated parts of web pages were moving on the screen but then after a few moments the display froze.


(In reply to comment #4)
> (In reply to comment #3)
> > > It would be really useful to get the kernel
> > > and userspace stack traces of the stuck processes.
> > 
> > What do you mean?  I'm not sure which processes you're referring to.
> 
> All of them. sys-rq-t (echo t > /proc/sysrq-trigger)

The next time there is a hang, I will do this.
Comment 6 Bob Ham 2014-06-14 13:26:06 UTC
Created attachment 101048 [details]
Kernel log from boot to first hang, including complete stack trace from Alt-SysRq-t

I managed to capture a complete stack trace for all processes after a hang, and a second complete stack trace from the hang after restarting X.
Comment 7 Bob Ham 2014-06-14 13:27:34 UTC
Created attachment 101049 [details]
Kernel log after second hang, of complete stack trace from Alt-SysRq-t
Comment 8 Bob Ham 2014-06-22 20:54:42 UTC
Created attachment 101538 [details]
Kernel log of stack trace from Alt-SysRq-t after first hang but before hang notification
Comment 9 Bob Ham 2014-06-22 20:55:57 UTC
Created attachment 101539 [details]
Kernel log of stack trace from Alt-SysRq-t after second hang after hang notification

Note that the hang notification occured only after restarting X
Comment 10 Bob Ham 2014-06-22 20:57:13 UTC
Please note that the hang does not lead to a hard lock anymore.  It certainly did in the past, presumably with earlier kernels/drivers but I no longer get a hard lock even after restarting X a number of times.
Comment 11 Bob Ham 2014-06-24 20:43:15 UTC
Created attachment 101696 [details]
GDB backtrace of gnome-shell after the kernel's hangcheck notification
Comment 12 Chris Wilson 2014-06-25 08:13:31 UTC
The second bug here is a frozen display to a bad vblank counter. Ville has recently fixed many bugs with our tracking of vblanks, so there is a good chance the latter bug is fixed. Of course we still have the earlier issue - but it would be good if we could confirm that it no longer freezes at least.
Comment 13 Bob Ham 2014-06-25 08:19:49 UTC
(In reply to comment #12)
> Ville has
> recently fixed many bugs with our tracking of vblanks, so there is a good
> chance the latter bug is fixed. Of course we still have the earlier issue -
> but it would be good if we could confirm that it no longer freezes at least.

Are these changes in a specific release, or are they only in git?  And if only in git, which repository and branch?  Thanks.
Comment 14 Jani Nikula 2014-09-08 09:41:00 UTC
Bob, sorry we've neglected this bug a bit. Please try drm-intel-nightly branch of http://cgit.freedesktop.org/drm-intel/ and report back. Thanks.
Comment 15 Jesse Barnes 2014-12-04 22:03:10 UTC
Bob, any update?
Comment 16 Bob Ham 2014-12-04 22:41:50 UTC
Sorry, the crashes got too much so I ditched the motherboard.  I can't help with debugging this issue anymore.
Comment 17 Jesse Barnes 2014-12-05 18:17:36 UTC
Ok sorry for the trouble, thanks for the effort anyway.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.