| Summary: | [nouveau, linux-3.7-rc] Broken cursor and kernel log swamped with trapped reads/writes from BAR/PFIFO_READ/FB | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | xorg | Reporter: | Bruno <bonbons> | ||||||||||||||
| Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||||||
| Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||
| Severity: | normal | ||||||||||||||||
| Priority: | medium | CC: | alzeih, ankur, bpierce815, bugs-freedesktop, chrisf, dh.herrmann, freedom, gsomlo, hebert.soares, ronny.standtke, rsalvaterra | ||||||||||||||
| Version: | unspecified | ||||||||||||||||
| Hardware: | Other | ||||||||||||||||
| OS: | All | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| i915 platform: | i915 features: | ||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
Bruno
2012-11-20 22:49:18 UTC
Same here on my MacbookPro(6,2) with NV50 graphics. Here's an example error line: [ 182.772471] nouveau E[ PFB][0000:01:00.0] trapped read at 0x000070ea38 on channel 0x0001fed0 BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT I can narrow the version down further: 3.6.10 works 3.7.0 does not. Created attachment 71608 [details]
dmesg
This also happens on my nVIDIA ION (MCP79) system, Unity Dash graphics become severely corrupted. I'm currently running Ubuntu 12.10 with the xorg edgers ppa. My dmesg is attached. I can provoke this bug with a simple drmModeSetCursor() or drmModeMoveCursor(). The cursor images have a horizontal black stripe (not always). Position varies on my machine. Using 3D acceleration without cursors works perfectly well (although starting mplayer caused a deadlock on my machine). It's also a nv50 card. I tried bisecting it and it turns out the memory-manager rewrite caused it (as reported on IRC). I was unable to revert the commit on top of 3.7 as it is quite complex. Rui Salvaterra: your bug is completely different from the others Indeed it is, I'm sorry for the noise. Wrong dmesg notwithstanding, I sometimes also get a lot of PAGE_NOT_PRESENT errors on this very same machine. This affects my MacBookPro6,2 with GT330 GPU as well. Exact same symptoms and error messages as the reporter. Last working kernel for me is 3.5.0-17 Ubuntu. I have tried 3.7.4 and various 3.8 rc releases and both show the same issue. This also affects my MacbookPro7,1 with an MCP89 (NVAF) The same symptoms: the middle of the cursor is transparent, and if I proceed to login directly (Gnome 3), it locks up eventually. One difference is the reason: VRAM_LIMIT instead of PAGE_NOT_PRESENT. The channel and address range seem to match the original submitter's dmesg. The workaround to suspend and resume also seems to work for me too on 3.7. It started happening from 3.7 onwards. I tried 3.8-rc4, and the bug's still there (however resume doesn't work on 3.8-rc4, probably something else wrong). Created attachment 73708 [details]
zcat /proc/config.gz > config
Also happens on my MacBookPro6,2 lspci -nn | grep VGA: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT216 [GeForce GT 330M] [10de:0a29] (rev a2) from dmesg: [ 1678.891945] nouveau E[ PFB][0000:01:00.0] trapped read at 0x00007139c0 on channel 0x0001fed0 BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT Workaround mentioned with suspend/resume also works for me. Running kernel config attached. I also have this issue on a MacBook5,1 Bottom half of mouse cursor is transparent. There are thousands of "trapped read at ... on channel ... BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT" with occasional "trapped write"s. All issues resolve after resuming from suspend. Running Arch Linux (kernel 3.7.5 x86_64) xf86-video-nouveau 1.0.6 xorg-server 1.12.2 libdrm 2.4.41 Created attachment 74513 [details] [review] limit pmem to 64kB does it change anything? (In reply to comment #12) > Created attachment 74513 [details] [review] [review] > limit pmem to 64kB > > does it change anything? No change for me (applied on top of linux-3.7.6, MacBookPro(2,1)). The error seems very much related to the cursor itself. Errors are only generated on cursor move/change. Created attachment 74545 [details] [review] initialize pmem offset what about this one? (In reply to comment #14) > Created attachment 74545 [details] [review] [review] > initialize pmem offset > > what about this one? No change with this one (instead of the previous one) either. Now as well as since the bug appeared there is sporadically a write error between the many ready errors. (same error message but s/read/write/) Just to be sure we are not chasing two different bugs - can you confirm 3863c9bc887e9638a9d905d55f6038641ece78d6 is the commit which introduced *this* bug? (In reply to comment #16) > Just to be sure we are not chasing two different bugs - can you confirm > 3863c9bc887e9638a9d905d55f6038641ece78d6 is the commit which introduced > *this* bug? I can't quickly check, both 3863c9bc887e9638a9d905d55f6038641ece78d6 and preceding 8a9b889e668a5bc2f4031015fe4893005c43403d don't compile here (gcc 4.6.3). They are failing on redeclaration of NV_* enums in nouveau_drv.h which were previously declared in core/include/core/device.h and struct nouveau_engine which was previously declared in core/include/core/engine.h Will look tomorrow evening (CET) at GIT history around those commits in order to get it compiling so I can check. I checked at commit aa4cc5d274c09909fe32861825c2377d0ccb3bfd and the bug is not yet present. I checked at commit 9274f4a9ba7e70d1770e237fca16d52f27f0c728 and the bug is not yet present. Kernel does not build starting with commit 9458029940ffc64bca0c5a30ea626c377205842e (fails on the redeclarations mentioned in comment #17). Commit 77145f1cbdf8d28b46ff8070ca749bad821e0774 is the first one to build again and is affected by the bug. The duplicate NV_* enum may be easy to fix though the struct nouveau_engine seems harder to correct. Weird, 3863c9bc887e9638a9d905d55f6038641ece78d6 compiles fine here on both gcc 4.7.2 and 4.6.3. Can you attach compile log? Created attachment 74862 [details]
Kernel config + compile log for 3863c9bc887e9638a9d905d55f6038641ece78d6
This includes the build log for rebuilding (trying to) kernel after touching all files under drivers/gpu/drm/nouveau.
The kernel config preceeds the build log in the same file.
For the whole range of commits where compilation fails the errors are the same though I did not take extra care to check if it was always failing for the same source files.
Created attachment 74935 [details] [review] compilation fix I'm not sure why I can't hit it, but this patch will probably resolve your compilation problem... (In reply to comment #21) > Created attachment 74935 [details] [review] [review] > compilation fix > > I'm not sure why I can't hit it, but this patch will probably resolve your > compilation problem... The patch does not fix compilation for me. Are you building in-tree or out-of-tree? I'm building out-of-tree. in-tree It seems that there is a fix for this issue in nouveau/master tough that kernel stalls on click to open menu in enlightenment 0.17.1 [compositing enabled]... And once the stall has happened, the GPU seems rather confused, OSX wont successfully start anymore and older Linux kernels show incorrectly tiled content with nouveau (nouveaufb) while EFIFB gets things displayed as usual. Note that one recent commit in xf86-video-nouveau (http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=912d418fdfd2e99eef1e5c631c76dda1d82cf451) may reduce visibility of this bug... Did you update xf86-video-nouveau too? (In reply to comment #23) > in-tree Yes, compiling in-tree does work... so includes search paths somehow are broken between those two revisions I mentioned in comment #18 when building out-of-tree. Not so good. (In reply to comment #25) > Note that one recent commit in xf86-video-nouveau > (912d418fdfd2e99eef1e5c631c76dda1d82cf451) may reduce visibility of this > bug... Did you update xf86-video-nouveau too? No, I just updated kernel. For userspace I'm still on xf86-video-nouveau-1.0.4, libdrm-2.4.40, mesa-9.0.1 and xorg-server-1.13.1 The nouveau/master commit I tried was 43b629c047... (drm/nouveau: Fix DPMS 1 on G4 Snowball, from snow white to coal black) by Stefan de Konink. (In reply to comment #16) > Just to be sure we are not chasing two different bugs - can you confirm > 3863c9bc887e9638a9d905d55f6038641ece78d6 is the commit which introduced > *this* bug? Bug present in 3863c9bc887e9638a9d905d55f6038641ece78d6. Bug not present in preceding 8a9b889e668a5bc2f4031015fe4893005c43403d. So yes, I can confirm. Bug is still present with linux-3.8.4, xf86-video-nouveau-1.0.7 Sorry, last comment should read bug is *no longer* present for me with latest kernel and drivers. I cannot reproduce it, either. linux-3.8.4 + libdrm-2.4.43 but I don't know what fixed it. Feel free to mark as fixed. Well, with my MCP89 on 3.8.x, I still get: trapped read at 0x000040fdd0 on channel 0x0000fee0 BAR/PFIFO_READ/FB reason: VRAM_LIMIT On 3.9-rc4 it's even worse: complete screen corruption and hard lock. Is mine a different bug? I do get lock-ups with 3.8.4 and 3.9-rc4. 3.9-rc4 does not have the corrupted cursor anymore but still has lots of trapped reads or writes. Kernel is still alive at lockup time and kernel log reports ... [ 521.627844] nouveau E[ PFB][0000:02:00.0] trapped read at 0x000040fe2c on channel 0x0000fee0 [unknown] BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT [ 521.627864] nouveau E[ PFB][0000:02:00.0] trapped read at 0x000040fe34 on channel 0x0000fee0 [unknown] BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT [ 521.627882] nouveau E[ PFB][0000:02:00.0] trapped read at 0x000040fe3c on channel 0x0000fee0 [unknown] BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT [ 521.630177] nouveau E[ PFB][0000:02:00.0] trapped write at 0x00004025c0 on channel 0x0000fee0 [unknown] BAR/PFIFO_WRITE/FB reason: PAGE_NOT_PRESENT [ 521.632255] nouveau E[ PFIFO][0000:02:00.0] still angry after 101 spins, halt [ 521.632273] nouveau E[ PFB][0000:02:00.0] trapped read at 0x00004025c0 on channel 0x0000fee0 [unknown] BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT [ 524.640022] [sched_delayed] sched: RT throttling activated Still broken for me on kernel 3.8.11-1 and xf86-video-nouveau 1.0.7-1 with the GeForce GT 330M. 3.9.0 with xf86-video-nouveau-1.0.7, libdrm-2.4.44 and mesa-9.1.2 seems to behave properly (no broken cursor and no log spamming). Not sure which part of the updates did the trick. There is still seldom complaints showing up, e.g. on VT switch. Will attach that log later on hopefully with some indication of which action triggers which messages. (In reply to comment #33) > 3.9.0 with xf86-video-nouveau-1.0.7, libdrm-2.4.44 and mesa-9.1.2 seems to > behave properly (no broken cursor and no log spamming). > > Not sure which part of the updates did the trick. > > There is still seldom complaints showing up, e.g. on VT switch. Will attach > that log later on hopefully with some indication of which action triggers > which messages. Ah. Well. Yes, I'm not getting log spamming or a broken mouse cursor with 3.9 or 3.8.11, but the corruption on the console (pre X) that started at the same time still persists. Perhaps not the same issue, but I'd be fairly certain it's related closely somehow. So I guess this ticket can be closed if there's another one open for the console graphics corruption? Closing as fixed per the comments. There are several NVAF-related bugs currently open, feel free to subscribe to them. (I've requested retests on them, although haven't heard back yet.) |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.