Bug 69029

Summary: [NVA8] GPU lockup since kernel 3.11 upgrade
Product: xorg Reporter: Frederic Crozat <fred>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: hugh
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
one lockup
none
another lockup
none
yet another lockup (with 3.11rc6)
none
log and system information from lockup none

Description Frederic Crozat 2013-09-06 11:35:17 UTC
Created attachment 85322 [details]
one lockup

on openSUSE Factory (nouveau driver 1.0.9, Mesa 9.2), since kernel was updated from 3.10.1 to 3.11.0 (rc6 or rc7), I'm getting very often GPU lockup when using Xorg (GNOME Shell).
Comment 1 Frederic Crozat 2013-09-06 11:36:08 UTC
Created attachment 85323 [details]
another lockup
Comment 2 Frederic Crozat 2013-09-06 12:28:02 UTC
Created attachment 85332 [details]
yet another lockup (with 3.11rc6)

not sure this particular lockup is the same or if it was fixed in rc7
Comment 3 Ilia Mirkin 2013-09-06 16:16:04 UTC
As I think was pointed out on IRC, it'd be great to have a bisect, or a way to reproduce.

The errors in the log are of the form:

nouveau E[  PGRAPH][0000:01:00.0] TRAP_TPDMA_RT - TP 0 - Unknown fault at address 0048d59400
nouveau E[  PGRAPH][0000:01:00.0] TRAP_TPDMA_RT - TP 0 - e0c: 00000000, e18: 00000000, e1c: 0010059e, e20: 00002a00, e24: 00030000
nouveau E[  PGRAPH][0000:01:00.0]  TRAP
nouveau E[  PGRAPH][0000:01:00.0] ch 5 [0x000fc34000 gnome-shell[1708]] subc 3 class 0x8597 mthd 0x0f04 data 0x00000000
nouveau E[     PFB][0000:01:00.0] trapped write at 0x0048d59400 on channel 0x0000fc34 [gnome-shell[1708]] PGRAPH/PROP/RT0 reason: PAGE_NOT_PRESENT

There has been some TTM rework in 3.11, perhaps this is a fallout of that.
Comment 4 charon00 2013-09-23 19:20:10 UTC
I have also been encountering problems with the X server crashing in kernel 3.11.1 in Fedora 19 64-bit (KDE desktop).  Usually the desktop locks up and eventually drops to command line where messages like the following are printed:

nouveau E[ X[945]] failed to idle channel 0xcccc0000 [X[945]]
nouveau E[ X[945]] failed to idle channel 0xcccc0000 [X[945]]
nouveau E[    PFB][0000:01:00.0] trapped read at 0x002001a040 on channel 0x0003fb1a [unknown] SEMAPHORE_BG/PFIFO_READ/00 reason: PAGE_NOT_PRESENT
Comment 5 Frederic Crozat 2013-09-25 13:20:40 UTC
Since upgrading to 3.11.1 (and some other update), I haven't been able to get GPU lockup for a week..
Comment 6 Frederic Crozat 2013-09-27 07:18:42 UTC
(In reply to comment #5)
> Since upgrading to 3.11.1 (and some other update), I haven't been able to
> get GPU lockup for a week..

I spoke too soo, I got a lockup this night, after 1 week without (system was idle, with screenlock :(
Comment 7 Allan Oepping 2013-11-14 16:39:11 UTC
Created attachment 89204 [details]
log and system information from lockup
Comment 8 Allan Oepping 2013-11-14 16:43:25 UTC
I've had this issue for some time. The system will work fine until the first lockup. After that it locks up everytime I unlock my desktop(for about 30 seconds). The lockup sometimes starts or occurs when going to the top left corner with the mouse in gnome-shell. Rebooting fixes the issue for a number of days.

https://bugs.freedesktop.org/attachment.cgi?id=89204

Thanks.
Comment 9 D. Hugh Redelmeier 2013-11-14 20:48:51 UTC
H.J. Lu points out that this might be the same bug as https://bugzilla.redhat.com/show_bug.cgi?id=918732

I'm hitting that fairly often.
Comment 10 Ilia Mirkin 2014-01-28 03:10:23 UTC
Allan: You have a NV92 card, and what appears to be a very different issue.

Frederic: Do I understand correctly that everything is consistently fine in 3.10.x and consistently broken in 3.11.x? If so, could you try doing a bisect to identify the offending commit? (First might be worth checking if the issue still occurs in 3.13 btw.) BTW, your third "lockup" log doesn't indicate that anything really went wrong -- just a kmalloc() failure which should be handled by the surrounding code. Although the later xfs corruption errors are a bit ominous... 

In any case, this is highly unlikely to see any progress without a git bisect on the kernel (drivers/gpu/drm should be enough).
Comment 11 Frederic Crozat 2014-01-28 09:21:52 UTC
(In reply to comment #10)
> Frederic: Do I understand correctly that everything is consistently fine in
> 3.10.x and consistently broken in 3.11.x? If so, could you try doing a
> bisect to identify the offending commit? (First might be worth checking if
> the issue still occurs in 3.13 btw.) BTW, your third "lockup" log doesn't
> indicate that anything really went wrong -- just a kmalloc() failure which
> should be handled by the surrounding code. Although the later xfs corruption
> errors are a bit ominous... 

With 3.11.6, I get very few lockups (if any), which is making bisection irrelevant/impossible to do.

the xfs corruption was caused by running mkinitrd which was trying to load xfs module even if there is no xfs partitions anywhere (this is now fixed).
Comment 12 Ilia Mirkin 2015-10-22 08:03:16 UTC
Sounds like the OP's issues are gone. Feel free to reopen if I misunderstood.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.