Bug 73744 - [NVA8] Constant lock ups with NVIDIA GeForce 8400 GS
Summary: [NVA8] Constant lock ups with NVIDIA GeForce 8400 GS
Status: RESOLVED INVALID
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-17 15:04 UTC by grave_123
Modified: 2016-02-23 07:19 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg_log (83.93 KB, text/plain)
2014-01-18 22:20 UTC, grave_123
no flags Details
nvapeek output with envytools (38 bytes, text/plain)
2014-01-18 22:48 UTC, grave_123
no flags Details
Log file with segfault in it (79.75 KB, text/plain)
2014-02-08 17:26 UTC, grave_123
no flags Details
dmesg from latest lock up (237.89 KB, text/plain)
2014-02-11 02:33 UTC, grave_123
no flags Details

Description grave_123 2014-01-17 15:04:20 UTC
Hello,

I am currently using the non-free NVIDIA drivers with a GeForce 8400 GS. My screen locks up whenever I use Nouveau to the point where I have to hard-shutdown my machine. I can't afford hardware damage so I'm temporarily using the non-free drivers.

Anyway ...

I am not sure what specifically causes the lock ups. I can not get to an emergency terminal to determine the cause in the moment either. At first, everything but the mouse cursor and sound seizes up and then those two seize up and there's excessive HDD activity. My only choice is to press and hold the pwr button and wait for it to shutdown.

I don't currently have a log file for the issue. I will switch back to Nouveau once I get a response of some kind. Haha.


OS: Ubuntu GNU/Linux x86-64 13.10
Linux: 3.11.0-15-generic
Graphics Card: NVIDIA GeForce 8400 GS
X.Org version: 1.14.5
Comment 1 Ilia Mirkin 2014-01-17 16:04:32 UTC
It'd be great to see a dmesg from an affected boot. One issue that has recently been identified is that some cards are missing a PCRYPT unit -- you can test out this theory by booting with nouveau.config=PCRYPT=0
Comment 2 grave_123 2014-01-18 06:35:34 UTC
Thank you for your response and (Hopefully) continued response!

I will send a txt output of a dmesg later on today as well as try out that test with PCRYPT and send a log of that output as well! :)
Comment 3 grave_123 2014-01-18 22:20:00 UTC
Created attachment 92359 [details]
dmesg_log

Here's the dmesg log when using Nouveau.
Comment 4 Ilia Mirkin 2014-01-18 22:34:14 UTC
Ah, interesting. This is a different 8400 GS than I thought (marketing names are so misleading). That PCRYPT=0 thing will do nothing useful. But.... it might still be the same issue I thought it might be, just not as visible in kernel messages.

Can you grab a copy of envytools (https://github.com/envytools/envytools) build it, and run

nvapeek 1540
nvapeek 154c

and report back with the results of those commands? (You can do this while on blob drivers as well as nouveau, doesn't matter.) Additionally, try booting with nouveau.config=PCE0=0 in the kernel cmdline and see if that fixes your issues. (That should cause it to use M2MF instead of COPY for buffer copies.)
Comment 5 grave_123 2014-01-18 22:48:14 UTC
Created attachment 92364 [details]
nvapeek output with envytools

This is the output for nvapeek with the commands ran in the exact order posted.
Comment 6 Ilia Mirkin 2014-01-18 22:50:53 UTC
OK, yeah, it's not at all the thing I thought it might be -- your PCOPY engine is intact:

$ lookup -a a8 1540 f2010001
PBUS.HWUNITS_0 => { TPC_MASK = 0x1 | PART_MASK = 0x1 | MP_MASK = 0x2 | UNK28 | UNK29 | VDEC | UNK31 }
$ lookup -a a8 154c 0000023d
PBUS.HWUNITS_1 => { PCIE_VERSION = 2 | PCI_CLASS = DISPLAY | PDISPLAY | UNK3 = 0x3 | PVLD | PCIE_SPEED = 2P5GT | PCOPY }

So... it's something else. Sorry, that's of little help to you.
Comment 7 grave_123 2014-01-18 23:06:38 UTC
(In reply to comment #6)
> OK, yeah, it's not at all the thing I thought it might be -- your PCOPY
> engine is intact:
> 
> $ lookup -a a8 1540 f2010001
> PBUS.HWUNITS_0 => { TPC_MASK = 0x1 | PART_MASK = 0x1 | MP_MASK = 0x2 | UNK28
> | UNK29 | VDEC | UNK31 }
> $ lookup -a a8 154c 0000023d
> PBUS.HWUNITS_1 => { PCIE_VERSION = 2 | PCI_CLASS = DISPLAY | PDISPLAY | UNK3
> = 0x3 | PVLD | PCIE_SPEED = 2P5GT | PCOPY }
> 
> So... it's something else. Sorry, that's of little help to you.

Okay, so what do we do now? :)
Comment 8 Ilia Mirkin 2014-01-28 02:09:02 UTC
There was an issue that suggested that 3.10.x worked well for someone but they started seeing issues with nva8 on 3.11. Perhaps you can try a 3.10.x kernel. See bug #69029. If this turns out to help, then do a bisect to figure out the offending commit.

Also it'd be interesting to see a log after the hang happens -- see if you can ssh into the machine after the screen hangs.
Comment 9 grave_123 2014-02-01 20:52:07 UTC
(In reply to comment #8)
> There was an issue that suggested that 3.10.x worked well for someone but
> they started seeing issues with nva8 on 3.11. Perhaps you can try a 3.10.x
> kernel. See bug #69029. If this turns out to help, then do a bisect to
> figure out the offending commit.
> 
> Also it'd be interesting to see a log after the hang happens -- see if you
> can ssh into the machine after the screen hangs.

If I can SSH into the machine, what information would you want me to snag in order to work further towards solving this issue? I have SSH set up right now.
Comment 10 Ilia Mirkin 2014-02-01 20:54:41 UTC
(In reply to comment #9)
> If I can SSH into the machine, what information would you want me to snag in
> order to work further towards solving this issue? I have SSH set up right
> now.

See if there's anything interesting dmesg I guess? It also would mean that the machine is fine, just the display has hung.
Comment 11 grave_123 2014-02-02 02:23:16 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > If I can SSH into the machine, what information would you want me to snag in
> > order to work further towards solving this issue? I have SSH set up right
> > now.
> 
> See if there's anything interesting dmesg I guess? It also would mean that
> the machine is fine, just the display has hung.

Alright I'll switch back to nouveau and report back when it happens! :)
Comment 12 grave_123 2014-02-08 17:25:11 UTC
Hey! The screen locked up and did something far worse than before. I was indeed able to SSH into my machine and I tried to make a log for dmesg but it was consumed by UFW output.

However ...

I did manage to cp all of my Xorg logs and found an interesting seg fault! Attached the log file.
Comment 13 grave_123 2014-02-08 17:26:02 UTC
Created attachment 93667 [details]
Log file with segfault in it

Not sure if this info for Xorg is helpful to the Nouveau folks but, shit, it's something.
Comment 14 Ilia Mirkin 2014-02-10 06:55:33 UTC
(In reply to comment #13)
> Created attachment 93667 [details]
> Log file with segfault in it
> 
> Not sure if this info for Xorg is helpful to the Nouveau folks but, shit,
> it's something.

Nope, that's just a crash that happens when the GPU locks up. The DDX isn't very good at handling that.
Comment 15 grave_123 2014-02-11 02:33:34 UTC
Created attachment 93820 [details]
dmesg from latest lock up

I think I got something good here! Here's the dmesg from the latest lock up.
Comment 16 Ilia Mirkin 2014-02-11 02:39:18 UTC
The relevant bits:

TRAP_MP_EXEC - TP 0 MP 1: INVALID_OPCODE at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d
 TRAP
ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data 0x3f800000
PGRAPH TLB flush idle timeout fail
PGRAPH_STATUS  : 0x00c00c03 BUSY DISPATCH CCACHE_UNK4 STRMOUT_GSCHED_UNK5 TPVP MP
PGRAPH_VSTATUS0: 0x00000208 CCACHE UNK5
PGRAPH_VSTATUS1: 0x00005600
PGRAPH_VSTATUS2: 0x00000000
TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d
 TRAP
ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data 0x3f800000
TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d
 TRAP
ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data 0x3f800000
PGRAPH TLB flush idle timeout fail
PGRAPH_STATUS  : 0x00c00c03 BUSY DISPATCH CCACHE_UNK4 STRMOUT_GSCHED_UNK5 TPVP MP
PGRAPH_VSTATUS0: 0x00000208 CCACHE UNK5
PGRAPH_VSTATUS1: 0x00005600
PGRAPH_VSTATUS2: 0x00000000
TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d
 TRAP

So it looks like the code got overwritten with 0x0d (13) bytes. Never seen a TIMEOUT error before -- neat. Not sure what to make of it though.
Comment 17 grave_123 2014-02-11 02:42:54 UTC
(In reply to comment #16)
> The relevant bits:
> 
> TRAP_MP_EXEC - TP 0 MP 1: INVALID_OPCODE at 07fcc0 warp 0, opcode 0d0d0d0d
> 0d0d0d0d
>  TRAP
> ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data
> 0x3f800000
> PGRAPH TLB flush idle timeout fail
> PGRAPH_STATUS  : 0x00c00c03 BUSY DISPATCH CCACHE_UNK4 STRMOUT_GSCHED_UNK5
> TPVP MP
> PGRAPH_VSTATUS0: 0x00000208 CCACHE UNK5
> PGRAPH_VSTATUS1: 0x00005600
> PGRAPH_VSTATUS2: 0x00000000
> TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d
>  TRAP
> ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data
> 0x3f800000
> TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d
>  TRAP
> ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data
> 0x3f800000
> PGRAPH TLB flush idle timeout fail
> PGRAPH_STATUS  : 0x00c00c03 BUSY DISPATCH CCACHE_UNK4 STRMOUT_GSCHED_UNK5
> TPVP MP
> PGRAPH_VSTATUS0: 0x00000208 CCACHE UNK5
> PGRAPH_VSTATUS1: 0x00005600
> PGRAPH_VSTATUS2: 0x00000000
> TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d
>  TRAP
> 
> So it looks like the code got overwritten with 0x0d (13) bytes. Never seen a
> TIMEOUT error before -- neat. Not sure what to make of it though.

I hope this was a good chunk of info here. Thanks for the SSH idea by the way. Saved a thrashing on my hardware! So what's up now? What's the plan? :))
Comment 18 grave_123 2014-05-06 19:50:11 UTC
Hey, I'm still getting this issue to this day with Ubuntu 14.04
Comment 19 Christopher M. Penalver 2016-02-23 07:19:15 UTC
grave_123@riseup.net, given you are using a downstream version of nouveau, it will help immensely if you filed a new report with Ubuntu by ensuring you have the package xdiagnose installed, and that you click the Yes button for attaching additional debugging information running the following from a terminal:
ubuntu-bug xorg

Also, please feel free to subscribe me to it.

For more on why this is helpful, please see https://wiki.ubuntu.com/ReportingBugs.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.