Summary: | [NVC8] linux-nouveau2.6 (3.6.0-rc4) : GTX580 : Xorg freezes when using accel | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Eric <3rik.gm> | ||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||
Severity: | critical | ||||||||||
Priority: | high | CC: | bryce, doityourselfteam, leifer | ||||||||
Version: | git | ||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||
OS: | Linux (All) | ||||||||||
See Also: |
https://bugzilla.redhat.com/show_bug.cgi?id=860477 https://launchpad.net/bugs/1039202 |
||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Attachments: |
|
Description
Eric
2012-09-03 12:53:10 UTC
Please, Also fix those bugs for linux-nouveau 3.4.10 Best regards, Eric Hello, Using external GTX 580 firmware helped ! Eric This bug persists. I consider it to be a pretty serious bug, too. Anyone with an nvc8 card who runs Gnome or Unity will have the card freeze up in seconds, although somewhat less frequently with KDE. Running firefox will immediately freeze the card. Ubuntu and Fedora both ship their liveCDs to enable nouveau by default and people cannot even run the installer for more than a minute, and most will have no idea what went wrong. Note that most nvc0 cards run just fine... I even have a nvc4 on another computer that runs perfectly, it is nvc8 that specifically has this problem, which very few (if any?) devs seem to posess. Feel free to google "nouveau gtx 580" to see that this is hitting a decent amount of people. The card works just fine if using the extracted firmware, but this is a poor solution. I have been reading the various envytools/hwdocs on the fuc and been trying to investigate this but I have hit a wall, this issue is just too difficult for me to handle. I am pretty sure the solution is to do whatever Ben did to get the nvd7/nvd9 cards working, which looked like adding chipset specific firmware data to the loading code, but I don't know nearly enough to do this myself. If anyone has some advice, please let me know, I would really like to see this bug closed. Created attachment 84279 [details]
kernel log of card freeze
Created attachment 84280 [details]
kernel log of card freeze
Another dmesg log of the freeze. Note that the read fault is not always at the same address.
Worth trying 3.11-rc6. A bunch of changes went into 3.11-rc1 related to register setup on nvc0+ cards. These logs are from the nouveau git after those changes hit. I've been tracking the git changes pretty carefully and they looked promising but alas, didn't work. *** Bug 45517 has been marked as a duplicate of this bug. *** (In reply to comment #3) > The card works just fine if using the extracted firmware Can you elaborate on how to do that? This sounds like a better solution than using the proprietary nvidia driver. *** Bug 81614 has been marked as a duplicate of this bug. *** This bug is affecting me also, see the last duplicated bug. Any progress in fixing this? Maybe some help in testing (for ex.) required? (In reply to comment #12) > This bug is affecting me also, see the last duplicated bug. Any progress in > fixing this? Maybe some help in testing (for ex.) required? It's a bit of a mystery unfortunately. Adding to the annoyance, Ben said that it does work just fine on his NVC8, although he has the less powerful versions. Could be something with high ROP/TPC/GPC counts not being handled. (Or multiple PARTs?) That might actually be an interesting experiment -- before loading nouveau, mask out a bunch of the units and see if it helps. If it does, find the "breaking" point. This is the code that computes that stuff: http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/graph/nvc0.c#n1330 priv->rop_nr = (nv_rd32(priv, 0x409604) & 0x001f0000) >> 16; priv->gpc_nr = nv_rd32(priv, 0x409604) & 0x0000001f; for (i = 0; i < priv->gpc_nr; i++) { priv->tpc_nr[i] = nv_rd32(priv, GPC_UNIT(i, 0x2608)); priv->tpc_total += priv->tpc_nr[i]; } Step 1: Print out the various values (i.e. number of ROPs, GPCs, and the per-GPC TPC counts). Step 2: Artificially lower them (to, e.g., 1) and see if it helps. If it does, figure out which of the values matter and where the breaking points are. If it doesn't help, perhaps the units need to be disabled a little harder, e.g. by setting 0x22584/0x22588. (In reply to comment #13) > > It's a bit of a mystery unfortunately. Adding to the annoyance, Ben said > that it does work just fine on his NVC8, although he has the less powerful > versions. Could be something with high ROP/TPC/GPC counts not being handled. > (Or multiple PARTs?) > > That might actually be an interesting experiment -- before loading nouveau, > mask out a bunch of the units and see if it helps. If it does, find the > "breaking" point. > > This is the code that computes that stuff: > > http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/graph/nvc0. > c#n1330 > > priv->rop_nr = (nv_rd32(priv, 0x409604) & 0x001f0000) >> 16; > priv->gpc_nr = nv_rd32(priv, 0x409604) & 0x0000001f; > for (i = 0; i < priv->gpc_nr; i++) { > priv->tpc_nr[i] = nv_rd32(priv, GPC_UNIT(i, 0x2608)); > priv->tpc_total += priv->tpc_nr[i]; > } > > Step 1: Print out the various values (i.e. number of ROPs, GPCs, and the > per-GPC TPC counts). > Step 2: Artificially lower them (to, e.g., 1) and see if it helps. If it > does, figure out which of the values matter and where the breaking points > are. > > If it doesn't help, perhaps the units need to be disabled a little harder, > e.g. by setting 0x22584/0x22588. Can you describe more detailed what I need to do? I'm afraid I'm not so advanced at this moment to understand everything in your comment. Maybe not in comments but by e-mail doityourselfteam@gmail.com *** Bug 81614 has been marked as a duplicate of this bug. *** (In reply to comment #13) > (In reply to comment #12) > > This bug is affecting me also, see the last duplicated bug. Any progress in > > fixing this? Maybe some help in testing (for ex.) required? > > It's a bit of a mystery unfortunately. Adding to the annoyance, Ben said > that it does work just fine on his NVC8, although he has the less powerful > versions. Could be something with high ROP/TPC/GPC counts not being handled. > (Or multiple PARTs?) > > That might actually be an interesting experiment -- before loading nouveau, > mask out a bunch of the units and see if it helps. If it does, find the > "breaking" point. > > This is the code that computes that stuff: > > http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/engine/graph/nvc0. > c#n1330 > > priv->rop_nr = (nv_rd32(priv, 0x409604) & 0x001f0000) >> 16; > priv->gpc_nr = nv_rd32(priv, 0x409604) & 0x0000001f; > for (i = 0; i < priv->gpc_nr; i++) { > priv->tpc_nr[i] = nv_rd32(priv, GPC_UNIT(i, 0x2608)); > priv->tpc_total += priv->tpc_nr[i]; > } > > Step 1: Print out the various values (i.e. number of ROPs, GPCs, and the > per-GPC TPC counts). > Step 2: Artificially lower them (to, e.g., 1) and see if it helps. If it > does, figure out which of the values matter and where the breaking points > are. > > If it doesn't help, perhaps the units need to be disabled a little harder, > e.g. by setting 0x22584/0x22588. Heres the printed out values: [ 3.185455] Rop nr: 6 [ 3.185457] Gpc nr: 4 [ 3.185460] Tpc nr for gpc 0: 4 [ 3.185463] Tpc nr for gpc 1: 4 [ 3.185466] Tpc nr for gpc 2: 4 [ 3.185469] Tpc nr for gpc 3: 4 I tried setting them all to 1, the card freezes pretty much immediately after logging into kwin (which is when I suspect opengl rendering starts), although oddly enough there was no read fault in the dmesg. I also tried setting them all to 2, and it froze pretty quickly too, and the machine became completely unrecoverable. Note that I also tried using the blob firmware with all values set to 2, so I think not having them at their natural amounts simply pisses the card off. Didn't try directly disabling stuff with 0x22584/0x22588, not entirely sure where I would do that even. Created attachment 103833 [details]
dmesg from GPU lockup
I'm affected by this bug as well. Card is ASUS ENGTX580 DCII. Distro: Arch Linux X.Org: 1.16 mesa: 10.2.4 xf86-video-nouveau: 1.0.10 I have a Korean Monitor so there are some errors about missing EDID in the dmesg, but even without an xorg.conf nouveau detected the 2560x1440 resolution and Gnome 3 looked fine until it locked up. The lockup happened, while I was typing a terminal in Gnome 3.12.2. Came out of nowhere. I attached my dmesg log. I rebooted and got another lockup very quickly but this time without the long list of nouveau E[ PDISP] messages. Channel value is the same, but process is Xorg.bin not mutter-launch. read fault at 0x4391800000 [PT_NOT_PRESENT] from PGRAPH/CTXCTL on channel 0x005fb79000 [Xorg.bin[909]] My computer survived the night with the latest patchset that made it into 3.17, so I am marking this as fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.