Summary: | [NVA8] Constant lock ups with NVIDIA GeForce 8400 GS | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | grave_123 | ||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||
Status: | RESOLVED INVALID | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||
Severity: | blocker | ||||||||||||
Priority: | medium | ||||||||||||
Version: | unspecified | ||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | i915 features: | ||||||||||||
Attachments: |
|
Description
grave_123
2014-01-17 15:04:20 UTC
It'd be great to see a dmesg from an affected boot. One issue that has recently been identified is that some cards are missing a PCRYPT unit -- you can test out this theory by booting with nouveau.config=PCRYPT=0 Thank you for your response and (Hopefully) continued response! I will send a txt output of a dmesg later on today as well as try out that test with PCRYPT and send a log of that output as well! :) Created attachment 92359 [details]
dmesg_log
Here's the dmesg log when using Nouveau.
Ah, interesting. This is a different 8400 GS than I thought (marketing names are so misleading). That PCRYPT=0 thing will do nothing useful. But.... it might still be the same issue I thought it might be, just not as visible in kernel messages. Can you grab a copy of envytools (https://github.com/envytools/envytools) build it, and run nvapeek 1540 nvapeek 154c and report back with the results of those commands? (You can do this while on blob drivers as well as nouveau, doesn't matter.) Additionally, try booting with nouveau.config=PCE0=0 in the kernel cmdline and see if that fixes your issues. (That should cause it to use M2MF instead of COPY for buffer copies.) Created attachment 92364 [details]
nvapeek output with envytools
This is the output for nvapeek with the commands ran in the exact order posted.
OK, yeah, it's not at all the thing I thought it might be -- your PCOPY engine is intact: $ lookup -a a8 1540 f2010001 PBUS.HWUNITS_0 => { TPC_MASK = 0x1 | PART_MASK = 0x1 | MP_MASK = 0x2 | UNK28 | UNK29 | VDEC | UNK31 } $ lookup -a a8 154c 0000023d PBUS.HWUNITS_1 => { PCIE_VERSION = 2 | PCI_CLASS = DISPLAY | PDISPLAY | UNK3 = 0x3 | PVLD | PCIE_SPEED = 2P5GT | PCOPY } So... it's something else. Sorry, that's of little help to you. (In reply to comment #6) > OK, yeah, it's not at all the thing I thought it might be -- your PCOPY > engine is intact: > > $ lookup -a a8 1540 f2010001 > PBUS.HWUNITS_0 => { TPC_MASK = 0x1 | PART_MASK = 0x1 | MP_MASK = 0x2 | UNK28 > | UNK29 | VDEC | UNK31 } > $ lookup -a a8 154c 0000023d > PBUS.HWUNITS_1 => { PCIE_VERSION = 2 | PCI_CLASS = DISPLAY | PDISPLAY | UNK3 > = 0x3 | PVLD | PCIE_SPEED = 2P5GT | PCOPY } > > So... it's something else. Sorry, that's of little help to you. Okay, so what do we do now? :) There was an issue that suggested that 3.10.x worked well for someone but they started seeing issues with nva8 on 3.11. Perhaps you can try a 3.10.x kernel. See bug #69029. If this turns out to help, then do a bisect to figure out the offending commit. Also it'd be interesting to see a log after the hang happens -- see if you can ssh into the machine after the screen hangs. (In reply to comment #8) > There was an issue that suggested that 3.10.x worked well for someone but > they started seeing issues with nva8 on 3.11. Perhaps you can try a 3.10.x > kernel. See bug #69029. If this turns out to help, then do a bisect to > figure out the offending commit. > > Also it'd be interesting to see a log after the hang happens -- see if you > can ssh into the machine after the screen hangs. If I can SSH into the machine, what information would you want me to snag in order to work further towards solving this issue? I have SSH set up right now. (In reply to comment #9) > If I can SSH into the machine, what information would you want me to snag in > order to work further towards solving this issue? I have SSH set up right > now. See if there's anything interesting dmesg I guess? It also would mean that the machine is fine, just the display has hung. (In reply to comment #10) > (In reply to comment #9) > > If I can SSH into the machine, what information would you want me to snag in > > order to work further towards solving this issue? I have SSH set up right > > now. > > See if there's anything interesting dmesg I guess? It also would mean that > the machine is fine, just the display has hung. Alright I'll switch back to nouveau and report back when it happens! :) Hey! The screen locked up and did something far worse than before. I was indeed able to SSH into my machine and I tried to make a log for dmesg but it was consumed by UFW output. However ... I did manage to cp all of my Xorg logs and found an interesting seg fault! Attached the log file. Created attachment 93667 [details]
Log file with segfault in it
Not sure if this info for Xorg is helpful to the Nouveau folks but, shit, it's something.
(In reply to comment #13) > Created attachment 93667 [details] > Log file with segfault in it > > Not sure if this info for Xorg is helpful to the Nouveau folks but, shit, > it's something. Nope, that's just a crash that happens when the GPU locks up. The DDX isn't very good at handling that. Created attachment 93820 [details]
dmesg from latest lock up
I think I got something good here! Here's the dmesg from the latest lock up.
The relevant bits: TRAP_MP_EXEC - TP 0 MP 1: INVALID_OPCODE at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d TRAP ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data 0x3f800000 PGRAPH TLB flush idle timeout fail PGRAPH_STATUS : 0x00c00c03 BUSY DISPATCH CCACHE_UNK4 STRMOUT_GSCHED_UNK5 TPVP MP PGRAPH_VSTATUS0: 0x00000208 CCACHE UNK5 PGRAPH_VSTATUS1: 0x00005600 PGRAPH_VSTATUS2: 0x00000000 TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d TRAP ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data 0x3f800000 TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d TRAP ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data 0x3f800000 PGRAPH TLB flush idle timeout fail PGRAPH_STATUS : 0x00c00c03 BUSY DISPATCH CCACHE_UNK4 STRMOUT_GSCHED_UNK5 TPVP MP PGRAPH_VSTATUS0: 0x00000208 CCACHE UNK5 PGRAPH_VSTATUS1: 0x00005600 PGRAPH_VSTATUS2: 0x00000000 TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d TRAP So it looks like the code got overwritten with 0x0d (13) bytes. Never seen a TIMEOUT error before -- neat. Not sure what to make of it though. (In reply to comment #16) > The relevant bits: > > TRAP_MP_EXEC - TP 0 MP 1: INVALID_OPCODE at 07fcc0 warp 0, opcode 0d0d0d0d > 0d0d0d0d > TRAP > ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data > 0x3f800000 > PGRAPH TLB flush idle timeout fail > PGRAPH_STATUS : 0x00c00c03 BUSY DISPATCH CCACHE_UNK4 STRMOUT_GSCHED_UNK5 > TPVP MP > PGRAPH_VSTATUS0: 0x00000208 CCACHE UNK5 > PGRAPH_VSTATUS1: 0x00005600 > PGRAPH_VSTATUS2: 0x00000000 > TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d > TRAP > ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data > 0x3f800000 > TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d > TRAP > ch 5 [0x003fc34000 compiz[2897]] subc 3 class 0x8597 mthd 0x0f04 data > 0x3f800000 > PGRAPH TLB flush idle timeout fail > PGRAPH_STATUS : 0x00c00c03 BUSY DISPATCH CCACHE_UNK4 STRMOUT_GSCHED_UNK5 > TPVP MP > PGRAPH_VSTATUS0: 0x00000208 CCACHE UNK5 > PGRAPH_VSTATUS1: 0x00005600 > PGRAPH_VSTATUS2: 0x00000000 > TRAP_MP_EXEC - TP 0 MP 1: TIMEOUT at 07fcc0 warp 0, opcode 0d0d0d0d 0d0d0d0d > TRAP > > So it looks like the code got overwritten with 0x0d (13) bytes. Never seen a > TIMEOUT error before -- neat. Not sure what to make of it though. I hope this was a good chunk of info here. Thanks for the SSH idea by the way. Saved a thrashing on my hardware! So what's up now? What's the plan? :)) Hey, I'm still getting this issue to this day with Ubuntu 14.04 grave_123@riseup.net, given you are using a downstream version of nouveau, it will help immensely if you filed a new report with Ubuntu by ensuring you have the package xdiagnose installed, and that you click the Yes button for attaching additional debugging information running the following from a terminal: ubuntu-bug xorg Also, please feel free to subscribe me to it. For more on why this is helpful, please see https://wiki.ubuntu.com/ReportingBugs. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.