System: (B)LFS i686-pc-linux-gnu kernel 3.10.10 (with its nouveau module) xorg-server v1.14.1 libdrm: git 2013-07-17 18:44 EST Xorg_nouveau: git 2013-07-17 18:46 EST Mesa v9.1.1 Motherboard: "ASUS P5E-VM HDMI" with Intel G35/ICH9R, intel Core2Duo E8400@3.0GHz, 4 GB. Graphics card: EVGA GeForce GT430 active; connected to Monitor: ASUS VE258 --------------------------------------------------------- Description 1. Boot up system to level 3 (text/VGA/console mode) 2. Start Xorg (also known as "X", "X11") 3. The system hangs. You can see the mouse pointer. You can manipulate it but that's all. The only way out is to hard reboot the machine. ---------------------------------------------------------- Comments 1. Up to and including 3.10.10, Xorg behaves normally. 2. A git bisect procedure starting with git bisect good v3.10 git bisect bad v3.11-rc7 reaches the last bisect step. At that point after compile and reboot the system hangs somewhere in the boot-up sequence. git show commit 5ee86c4190f9e19a9e13906389069c73d7f75bfb Author: Ben Skeggs <bskeggs@redhat.com> Date: Mon Jul 1 14:32:42 2013 +1000 drm/nve0-/gr: some new gpc registers can have multiple copies GK110 exposes more than one, and needs to be dealt with in the ctxsw ucode just like the TPC sets are. Broadcast is at +0xe00. ... cat .git/HEAD # For confirmation 5ee86c4190f9e19a9e13906389069c73d7f75bfb Please note that at this point the bisect is at 0 revisions left, 1 step. Unfortunately, this commit bring the additional problem, the boot hung. 3. See attachments Xorg.0.log-3.10.10good Xorg.0.log-3.11bug # The most telling as to the problem config-3.10.10 config-3.11 4. In trying to narrow down to the "exact" commit responsible for the subject bug, I created a "timeline" of events related to the problems befalling my system. A-----a---b---c---d---e---f---g-----B (to the left: earlier in time) Note: the "line" component of the word "timeline" is from a bisect standpoint. Actually, there are a lot of branches and merges along the way. Legend: A: v3.10 X11 Good (a good "bisect good" start) B: v3.11-rc1 X11 Bug (a good "bisect bad" start) a: 791dc143 X11 Good Jun 27 17:35 b: 7c397cd9 X11 Good Jun 28 14:24 c: 56fbd2b6 Hang on boot-up May 2 12:37 d: c03ff9e8 Hang on boot-up Jul 1 10:54 e: 5ee86c41 Hang on boot-up Jul 1 14:32 f: 70f824ac X11 starts but cursor is flaky (and soon lost) Jul 1 14:48 g: 8f6fe267 X11 Bug Jul 4 13:03 Commit references: 791dc143ed2c441f5202d8721609d94dce9fcf88 Author: Maarten Lankhorst drm/nvd0-/disp: handle case where display engine is missing/disabled Signed-off-by: Ben Skeggs 7c397cd97b8f46659698396b420bd48c3e6703e6 Author: Joonyoung Shim drm: add mmap function to prime helpers Signed-off-by: Joonyoung Shim and Dave Airlie 56fbd2b65446d4fb4df7770c49a70d563b7569c9 Author: Ben Skeggs drm/nvf0/fifo: enable support Signed-off-by: Ben Skeggs [drivers/gpu/drm/nouveau] c03ff9e8fa5fc0186158b99a89f613325ff352cf Author: Ben Skeggs drm/nvc0-/gr: pull out a group of separately context-switched gpc regs Signed-off-by: Ben Skeggs [drivers/gpu/drm/nouveau] commit 5ee86c4190f9e19a9e13906389069c73d7f75bfb Author: Ben Skeggs drm/nve0-/gr: some new gpc registers can have multiple copies Signed-off-by: Ben Skeggs [drivers/gpu/drm/nouveau] 70f824ac8c369194e9499c59e687c6aa8b1a10c8 Author: Ben Skeggs drm/nvc0-/gr: tpc regs a subset of gpc, add separate list for gpc/unk regs Signed-off-by: Ben Skeggs <bskeggs@redhat.com> [drivers/gpu/drm/nouveau] 8f6fe26745d39299d43d79dd7ba9838517624c3f Author: Ben Skeggs drm/nvf0/gr: build cs ucode for GK110 Signed-off-by: Ben Skeggs [drivers/gpu/drm/nouveau] CONCLUSION The bug was introduced and is lurking in the "nouveau" code of the "official" release, 3.11. The NVidia GT430 uses the "nouveau" driver, so it is affected, as described, by the subject bug. Note: There is a "fix gpc hardware regression" in the nouveau branch: a0b7ebfb39bf80078d12bd1a46ec40d0d0dbdeef including some patches. Apparently, the corresponding "mainline" commit is d2989b534ef6834ebf2425aecc040b894b567c91 but it doesn't seem to affect (positively) the "problem" commit, 5ee86c4190f9e19a9e13906389069c73d7f75bfb (nor can one test it as long as the system hangs at boot!)
Created attachment 85596 [details] Xorg log of the buggy version, 3.11
Created attachment 85597 [details] Xorg log of the last "official" release of 3.10.x, 3.10.10
Created attachment 85598 [details] '.config' of the buggy version, 3.11
Created attachment 85599 [details] '.config' of the last "official" release of 3.10.x, 3.10.10
The stuff in the X logs is just a side effect of nouveau_bo_wait hanging. Is there anything interesting in dmesg? Can you try git checkout 5ee86c4190f9e git cherry-pick d2989b534ef6834ebf2425aecc040b894b567c91 And run that kernel? (After that you'll need to use git reset --hard to get back into a good state.) BTW, don't let the dates confuse you -- they have nothing to do with commit order. e.g. 5ee86c41 is an ancestor of 56fbd2b65 even though it's dated later.
> Can you try git checkout 5ee86c4190f9e git cherry-pick d2989b534ef6834ebf2425aecc040b894b567c91 > And run that kernel? Hi Ilia, [~/linux]$ git status # On branch master nothing to commit, working directory clean [~/linux]$ git checkout 5ee86c4190f9e Note: checking out '5ee86c4190f9e'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b new_branch_name HEAD is now at 5ee86c41... drm/nve0-/gr: some new gpc registers can have multiple copies [~/linux]$ git cherry-pick d2989b534ef6834ebf2425aecc040b894b567c91 error: could not apply d2989b5... drm/nvc0/gr: fix gpc firmware regression hint: after resolving the conflicts, mark the corrected paths hint: with 'git add <paths>' or 'git rm <paths>' hint: and commit the result with 'git commit' ------------------------------------------------------------------------------- BTW (this might shed some extra light): Yesterday, before resubmitting the bug, I tried to apply the a0b7ebfb39bf80078d12bd1a46ec40d0d0dbdeef patches to 5ee (edited to fit the master configuration). I called the new patch "M.patch" (for Maarten). This is what I got: [~/linux]$ patch -p1 < /root/patches/M.patch ; echo $? patching file drivers/gpu/drm/nouveau/core/engine/graph/fuc/gpc.fuc Hunk #1 succeeded at 353 (offset 1 line). patching file drivers/gpu/drm/nouveau/core/engine/graph/fuc/gpcnvc0.fuc.h Hunk #1 FAILED at 400. 1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/nouveau/core/engine/graph/fuc/gpcnvc0.fuc.h.rej 1 so I gave up. Caveat: it's possible I may have messed up the patches (original and edited) I'll attach them for your critique. -- Alex
Created attachment 85636 [details] "original" patch
Created attachment 85637 [details] "original" patch edited to fit my perceived reality
On my comment #6, I forgot to insert before "so I gave up." this comment: I went ahead and I compiled/built the thing (successfully) but, as always, it failed on boot-up (the "hang"). Hope this gives a better meaning to the "so I gave up." Sorry.
UPDATES and CLARIFICATIONS (to whom it may concern) 1. The bug persists in 3.11.1 ("Latest Stable Kernel"). 2. The Boot-up hang problem Encountered in bisecting for the subject bug. Certain commits hang up on boot. Hard (you have to press the Chernobyl button to restart the machine.) Note: as a "side effect" of those occurrences, the subject X11 bug becomes moot. (I suspect some would consider this a _good_ side effect). I pinpointed the offending commit by looking at the parents of c03ff9e8: c03ff9e8fa5fc0186158b99a89f613325ff352cf hangs on boot-up 30f4e0870d1726f31aa59804337cfd5e0a3f2ec7 hangs on boot-up 791dc143ed2c441f5202d8721609d94dce9fcf88 GOOD ... a0376b1481fdb9c9e8064ea0c5af8bd80da3f8f3 GOOD ... 79442c3af0525e81d4598e272abe5db60c489c62 GOOD ... 99bd5537bd22256866d83033e0aab2586616bcc2 GOOD ------ So the boot-up hard HANG problem starts with commit 30f4e0870d1726f31aa59804337cfd5e0a3f2e Author: Ben Skeggs <bskeggs@redhat.com> Date: Sun Jun 9 16:08:22 2013 +1000 drm/nvc0-/gr: make register lists from initvals functions Generated context verified to be the same for all supported chipsets. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> --------------------------------------------------------- 3. To clear up the confusion I created by confusing dates with commit order, ancestors and the like. I performed a "manual" bisect procedure in the "lineage" of 8f6fe267 where the subject bug is introduced (and where 5ee86c41 is indeed an ancestor of 56fbd2b65). Legend GOOD - Clean boot-up to text/console mode (level 3). Clean X11/Xorg (level 5) HANG - Hard hang on boot-up (to text/console mode). FLAKY - Flaky X11 cursor: after, say, starting xterm and exiting it, the cursor disappears and you have to restart the machine. BAD - X11/Xorg is hung. FWIW, cursor can still be moved with the left mouse. No other reaction. The Subject BUG. The parents of 8f6fe267: 8f6fe26745d39299d43d79dd7ba9838517624c3f BAD 960b4381c5fff0b0f16f4b812082811dde1ab7ab B 0085a60524aeb743c15bbdf7354f4e4f6623243e B 18ac4246510bee85d2efd6ed536b436e246f7624 B 9d1c4c51ce9cd133a25b339be398e35663cc2ae5 B b054aadfb0030b9717bb22f4283bfe5aec13440b B 9ec2dbba9fedbd1788849fb00d659ebdf549a4f8 B 56fbd2b65446d4fb4df7770c49a70d563b7569c9 B 26410c679865bcfcbe18422ca1eb472cf12ea82d B a32b2ffb82b5a386a13fde40dc131f853636dcf5 B 70f824ac8c369194e9499c59e687c6aa8b1a10c8 FLAKY 5ee86c4190f9e19a9e13906389069c73d7f75bfb HANG c03ff9e8fa5fc0186158b99a89f613325ff352cf H 30f4e0870d1726f31aa59804337cfd5e0a3f2ec7 H 791dc143ed2c441f5202d8721609d94dce9fcf88 GOOD ...
*** Bug 69488 has been marked as a duplicate of this bug. ***
*** Bug 69830 has been marked as a duplicate of this bug. ***
I have switched to a different graphics card (GT220) and it's working fine with 3.12-rc2. This means I'm free to ship my GT430 to a developer for testing/debugging if desired.
*** Bug 69768 has been marked as a duplicate of this bug. ***
I've tried going through the commit, but after ~30minutes and half of the patch gone there was still more than 6000 lines left :\ Alex, Alan, Marc can anyone of you guys do a mmio trace of _nouveau_ before and after the offending patch ? Thanks [1] http://nouveau.freedesktop.org/wiki/MmioTrace/
i don't really know how to apply this patch to my current kernel, if i need to recompile or anything else
(In reply to comment #16) > i don't really know how to apply this patch to my current kernel, if i need > to recompile or anything else I do not think that applying/reverting the patch on top of current kernel is possible. Here is an example "how to" 1. Clone the git tree $ git clone git://anongit.freedesktop.org/nouveau/linux-2.6 2. Checkout at good/working commit $ git checkout -B fdo69203.good 30f4e0870d1726f31aa59804337cfd5e0a3f2e~1 3. Rebuild the kernel and install 4. MMIO trace it 5. Checkout the first bad/non-working commit $ git checkout -B fdo69203.good 30f4e0870d1726f31aa59804337cfd5e0a3f2e 6. Rebuild the kernel and install 7. MMIO trace it 8. Email the dumps to mmio dot trace at gmail dot com 9. [Optional] Diff the traces and attach the output :)
(In reply to comment #17) > 5. Checkout the first bad/non-working commit > $ git checkout -B fdo69203.good 30f4e0870d1726f31aa59804337cfd5e0a3f2e ouch s/fdo69203.good/fdo69203.bad A couple of notes: * The branch names "fdo69203.good, fdo69203.bad" and chosen just for clarity. * The commit in step 2 differs from that in step 5. The "~1" at the end refers to the parent commit.
it miss something... ouch what is this command?
(In reply to comment #19) > it miss something... > > ouch > > what is this command? I made a typo in step 5: fdo69203.good should read fdo69203.bad :)
Please try the latest nouveau/master. Commit http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=968a8d1b6c32c9f466f236032770b9165ece045a addresses a part of the bisected 30f4e0870d1726f31aa59804337cfd5e0a3f2ec7 commit directly regarding nvc1.
really hard to clone this git repository..... alway get a timeout... any mirrors?
(In reply to comment #17) > $ git clone git://anongit.freedesktop.org/nouveau/linux-2.6 > The above git repo can be replaced with any mirror of Linus' tree. (In reply to comment #22) > really hard to clone this git repository..... alway get a timeout... any > mirrors? Strange fd.o seems to work fine in here. Either way give this a try Checkout the nouveau-master branch over at https://github.com/evelikov/linux
ok i downloaded: https://github.com/evelikov/linux/archive/nouveau-master.zip should be ok?
i get: *** No rule to make target `drivers/ata/libata-core.o', needed by `drivers/ata/libata.o'. Stop. when i run make.
(In reply to comment #25) > i get: > *** No rule to make target `drivers/ata/libata-core.o', needed by > `drivers/ata/libata.o'. Stop. > > when i run make. Sorry but this is not even remotely related to nouveau :\ Follow your distributions instructions on "how to rebuild the kernel" or request $insert_distro_name$ package maintainer to apply the mentioned commit.
get same problem with other distribution.. downloaded directly final 3.12 and without passing theses values ro showopts apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset x11failsafe i can't boot and get MMIO read of 0x000000000 fault a 1x10f20c MMIO read of 0x000000000 fault a 1x17e8dc
Created attachment 88797 [details] picture error boot kernel 3.12 picture error boot kernel 3.12
Created attachment 88798 [details] picture error boot kernel 3.12 picture error boot kernel 3.12
If the suggested patch indeed fixes your issue, it would be expected that the final 3.12 release would still exhibit problems, as that patch is not upstream. Either pull the branch Emil pointed you at + build kernel, or pull nouveau/master from fd.o, or apply the patch manually to (nearly) any tree.
it's not possible to get only the module to install?
i cloned kernel 3.12.0 i took this patch: http://cgit.freedesktop.org/nouveau/linux-2.6/patch/?id=968a8d1b6c32c9f466f236032770b9165ece045a so the file look like this one without email inside...: http://pastebin.ca/2474649 patch -p1 < ../nouveau.patch patching file drivers/gpu/drm/nouveau/core/engine/device/nv04.c Hunk #1 FAILED at 57. Hunk #2 FAILED at 75. 2 out of 2 hunks FAILED -- saving rejects to file drivers/gpu/drm/nouveau/core/engine/device/nv04.c.rej patching file drivers/gpu/drm/nouveau/core/engine/device/nv10.c Hunk #1 FAILED at 76. Hunk #2 FAILED at 95. Hunk #3 FAILED at 114. Hunk #4 FAILED at 133. Hunk #5 FAILED at 152. Hunk #6 FAILED at 171. Hunk #7 FAILED at 190. 7 out of 7 hunks FAILED -- saving rejects to file drivers/gpu/drm/nouveau/core/engine/device/nv10.c.rej patching file drivers/gpu/drm/nouveau/core/engine/device/nv20.c Hunk #1 FAILED at 60. Hunk #2 FAILED at 79. Hunk #3 FAILED at 98. Hunk #4 FAILED at 117. 4 out of 4 hunks FAILED -- saving rejects to file drivers/gpu/drm/nouveau/core/engine/device/nv20.c.rej patching file drivers/gpu/drm/nouveau/core/engine/device/nv30.c Hunk #1 FAILED at 60. Hunk #2 FAILED at 79. Hunk #3 FAILED at 98. Hunk #4 FAILED at 118. Hunk #5 FAILED at 138. 5 out of 5 hunks FAILED -- saving rejects to file drivers/gpu/drm/nouveau/core/engine/device/nv30.c.rej patching file drivers/gpu/drm/nouveau/core/engine/device/nv40.c Hunk #1 FAILED at 63. Hunk #2 FAILED at 84. Hunk #3 FAILED at 105. Hunk #4 FAILED at 126. Hunk #5 FAILED at 147. Hunk #6 FAILED at 168. Hunk #7 FAILED at 189. Hunk #8 FAILED at 210. Hunk #9 FAILED at 231. Hunk #10 FAILED at 252. Hunk #11 FAILED at 273. Hunk #12 FAILED at 294. Hunk #13 FAILED at 315. Hunk #14 FAILED at 336. Hunk #15 FAILED at 357. Hunk #16 FAILED at 378. 16 out of 16 hunks FAILED -- saving rejects to file drivers/gpu/drm/nouveau/core/engine/device/nv40.c.rej patching file drivers/gpu/drm/nouveau/core/engine/device/nv50.c Hunk #1 FAILED at 71. Hunk #2 FAILED at 94. Hunk #3 FAILED at 120. Hunk #4 FAILED at 146. Hunk #5 FAILED at 172. Hunk #6 FAILED at 198. Hunk #7 FAILED at 224. Hunk #8 FAILED at 250. Hunk #9 FAILED at 276. Hunk #10 FAILED at 302. Hunk #11 FAILED at 328. Hunk #12 FAILED at 355. Hunk #13 FAILED at 381. Hunk #14 FAILED at 407. 14 out of 14 hunks FAILED -- saving rejects to file drivers/gpu/drm/nouveau/core/engine/device/nv50.c.rej patching file drivers/gpu/drm/nouveau/core/engine/device/nvc0.c Hunk #1 FAILED at 73. Hunk #2 FAILED at 102. Hunk #3 FAILED at 131. Hunk #4 FAILED at 159. Hunk #5 FAILED at 188. Hunk #6 FAILED at 217. Hunk #7 FAILED at 245. Hunk #8 FAILED at 274. Hunk #9 FAILED at 302. 9 out of 9 hunks FAILED -- saving rejects to file drivers/gpu/drm/nouveau/core/engine/device/nvc0.c.rej patching file drivers/gpu/drm/nouveau/core/engine/device/nve0.c Hunk #1 FAILED at 73. Hunk #2 FAILED at 103. Hunk #3 FAILED at 133. Hunk #4 FAILED at 163. Hunk #5 FAILED at 196.
(In reply to comment #32) > i cloned kernel 3.12.0 > > i took this patch: > http://cgit.freedesktop.org/nouveau/linux-2.6/patch/ > ?id=968a8d1b6c32c9f466f236032770b9165ece045a > > so the file look like this one without email inside...: > http://pastebin.ca/2474649 Perhaps take a moment to follow those links and see whether they look similar? [hint, they don't]
remove email part.... now same thing patching file drivers/gpu/drm/nouveau/core/engine/graph/ctxnvc1.c patching file drivers/gpu/drm/nouveau/core/engine/graph/ctxnvd7.c patching file drivers/gpu/drm/nouveau/core/engine/graph/ctxnvd9.c patch unexpectedly ends in middle of line
If you're unable to apply patches and clone repos, you'll be best served by waiting for 3.13-rc1, which should contain the patch in question.
good news, patch work fine i can boot, do a suspend to ram, wake up the machine, all work fine.
I've recommended the patch in question for -stable, so it should be making its way to the next 3.11.x and 3.12.x releases.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.