Bug 57350 - [nouveau, linux-3.7-rc] Broken cursor and kernel log swamped with trapped reads/writes from BAR/PFIFO_READ/FB
[nouveau, linux-3.7-rc] Broken cursor and kernel log swamped with trapped rea...
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/nouveau
unspecified
Other All
: medium normal
Assigned To: Nouveau Project
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-20 22:49 UTC by Bruno
Modified: 2013-08-31 05:49 UTC (History)
11 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (190.68 KB, text/plain)
2012-12-16 23:29 UTC, Rui Salvaterra
no flags Details
zcat /proc/config.gz > config (128.40 KB, text/plain)
2013-01-27 04:13 UTC, alzeih
no flags Details
limit pmem to 64kB (1.30 KB, patch)
2013-02-09 21:34 UTC, Marcin Slusarz
no flags Details | Splinter Review
initialize pmem offset (499 bytes, patch)
2013-02-10 13:58 UTC, Marcin Slusarz
no flags Details | Splinter Review
Kernel config + compile log for 3863c9bc887e9638a9d905d55f6038641ece78d6 (81.57 KB, text/plain)
2013-02-15 09:54 UTC, Bruno
no flags Details
compilation fix (470 bytes, patch)
2013-02-16 12:10 UTC, Marcin Slusarz
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Bruno 2012-11-20 22:49:18 UTC
On a MacBookAir with GeForce 9400M (C79) when X starts after booting (using Enlightenment with compositing) a horizontal bar in the middle of cursor (about 1/3 of cursor height) is transparent instead of showing cursor.

At the same time, kernel log is being filled with following error:
nouveau E[     PFB][0000:02:00.0] trapped read at $addr on channel $fixed_addr BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT


$fixed_addr is always the same (0x0000fee0) and $addr changes each time (e.g. 0x000040fe00 then 0x000040fe0c then 0x000040fe14 and continues incrementing)

Doing a suspend to ram and resume makes the cursor show correctly and error message not appearing anymore in kernel log.

Software versions:
 linux-3.7-rc5, linux-3.7-rc6
 xorg-server-1.13.0
 xf86-video-nouveau-1.0.4
 libdrm-2.4.40
 mesa-9.0

It was also the case with xorg-server-1.12.2, xf86-video-nouveau GIT snapshot from march 2012, libdrm-2.4.33, mesa-8.0.3) but did not happen with this setup with older kernel (linux-3.5.x, not sure anymore about linux-3.6.x)
Comment 1 Brandon Smith 2012-12-13 01:54:49 UTC
Same here on my MacbookPro(6,2) with NV50 graphics.

Here's an example error line:
[  182.772471] nouveau E[     PFB][0000:01:00.0] trapped read at 0x000070ea38 on channel 0x0001fed0 BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT

I can narrow the version down further:
3.6.10 works
3.7.0 does not.
Comment 2 Rui Salvaterra 2012-12-16 23:29:57 UTC
Created attachment 71608 [details]
dmesg
Comment 3 Rui Salvaterra 2012-12-16 23:34:06 UTC
This also happens on my nVIDIA ION (MCP79) system, Unity Dash graphics become severely corrupted. I'm currently running Ubuntu 12.10 with the xorg edgers ppa. My dmesg is attached.
Comment 4 David Herrmann 2012-12-17 01:26:33 UTC
I can provoke this bug with a simple drmModeSetCursor() or drmModeMoveCursor(). The cursor images have a horizontal black stripe (not always). Position varies on my machine. Using 3D acceleration without cursors works perfectly well (although starting mplayer caused a deadlock on my machine).

It's also a nv50 card.

I tried bisecting it and it turns out the memory-manager rewrite caused it (as reported on IRC). I was unable to revert the commit on top of 3.7 as it is quite complex.
Comment 5 Marcin Slusarz 2012-12-18 18:31:00 UTC
Rui Salvaterra: your bug is completely different from the others
Comment 6 Rui Salvaterra 2012-12-19 08:23:27 UTC
Indeed it is, I'm sorry for the noise. Wrong dmesg notwithstanding, I sometimes also get a lot of PAGE_NOT_PRESENT errors on this very same machine.
Comment 7 Parag 2013-01-22 15:52:34 UTC
This affects my MacBookPro6,2 with GT330 GPU as well. Exact same symptoms and error messages as the reporter. Last working kernel for me is 3.5.0-17 Ubuntu. I have tried 3.7.4 and various 3.8 rc releases and both show the same issue.
Comment 8 Mourad De Clerck 2013-01-27 01:58:21 UTC
This also affects my MacbookPro7,1 with an MCP89 (NVAF)

The same symptoms: the middle of the cursor is transparent, and if I proceed to login directly (Gnome 3), it locks up eventually.

One difference is the reason: VRAM_LIMIT instead of PAGE_NOT_PRESENT. The channel and address range seem to match the original submitter's dmesg.

The workaround to suspend and resume also seems to work for me too on 3.7.

It started happening from 3.7 onwards. I tried 3.8-rc4, and the bug's still there (however resume doesn't work on 3.8-rc4, probably something else wrong).
Comment 9 alzeih 2013-01-27 04:13:44 UTC
Created attachment 73708 [details]
zcat /proc/config.gz > config
Comment 10 alzeih 2013-01-27 04:15:46 UTC
Also happens on my MacBookPro6,2

lspci -nn | grep VGA:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GT216 [GeForce GT 330M] [10de:0a29] (rev a2)

from dmesg:

[ 1678.891945] nouveau E[     PFB][0000:01:00.0] trapped read at 0x00007139c0 on channel 0x0001fed0 BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT

Workaround mentioned with suspend/resume also works for me.

Running kernel config attached.
Comment 11 Ankur 2013-02-09 04:07:04 UTC
I also have this issue on a MacBook5,1

Bottom half of mouse cursor is transparent.

There are thousands of "trapped read at ... on channel ... BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT" with occasional "trapped write"s.

All issues resolve after resuming from suspend.

Running Arch Linux (kernel 3.7.5 x86_64)
xf86-video-nouveau 1.0.6
xorg-server 1.12.2
libdrm 2.4.41
Comment 12 Marcin Slusarz 2013-02-09 21:34:25 UTC
Created attachment 74513 [details] [review]
limit pmem to 64kB

does it change anything?
Comment 13 Bruno 2013-02-09 22:48:14 UTC
(In reply to comment #12)
> Created attachment 74513 [details] [review] [review]
> limit pmem to 64kB
> 
> does it change anything?

No change for me (applied on top of linux-3.7.6, MacBookPro(2,1)).

The error seems very much related to the cursor itself.
Errors are only generated on cursor move/change.
Comment 14 Marcin Slusarz 2013-02-10 13:58:34 UTC
Created attachment 74545 [details] [review]
initialize pmem offset

what about this one?
Comment 15 Bruno 2013-02-10 15:24:21 UTC
(In reply to comment #14)
> Created attachment 74545 [details] [review] [review]
> initialize pmem offset
> 
> what about this one?

No change with this one (instead of the previous one) either.


Now as well as since the bug appeared there is sporadically a write error between the many ready errors. (same error message but s/read/write/)
Comment 16 Marcin Slusarz 2013-02-12 20:08:45 UTC
Just to be sure we are not chasing two different bugs - can you confirm 3863c9bc887e9638a9d905d55f6038641ece78d6 is the commit which introduced *this* bug?
Comment 17 Bruno 2013-02-12 21:41:23 UTC
(In reply to comment #16)
> Just to be sure we are not chasing two different bugs - can you confirm
> 3863c9bc887e9638a9d905d55f6038641ece78d6 is the commit which introduced
> *this* bug?

I can't quickly check, both 3863c9bc887e9638a9d905d55f6038641ece78d6 and preceding 8a9b889e668a5bc2f4031015fe4893005c43403d don't compile here (gcc 4.6.3).

They are failing on redeclaration of NV_* enums in nouveau_drv.h which were previously declared in core/include/core/device.h and struct nouveau_engine which was previously declared in core/include/core/engine.h

Will look tomorrow evening (CET) at GIT history around those commits in order to get it compiling so I can check.
Comment 18 Bruno 2013-02-14 21:31:35 UTC
I checked at commit aa4cc5d274c09909fe32861825c2377d0ccb3bfd and the bug is not yet present.

I checked at commit 9274f4a9ba7e70d1770e237fca16d52f27f0c728 and the bug is not yet present.

Kernel does not build starting with commit 9458029940ffc64bca0c5a30ea626c377205842e (fails on the redeclarations mentioned in comment #17).

Commit 77145f1cbdf8d28b46ff8070ca749bad821e0774 is the first one to build again and is affected by the bug.


The duplicate NV_* enum may be easy to fix though the struct nouveau_engine seems harder to correct.
Comment 19 Marcin Slusarz 2013-02-14 22:07:08 UTC
Weird, 3863c9bc887e9638a9d905d55f6038641ece78d6 compiles fine here on both gcc 4.7.2 and 4.6.3. Can you attach compile log?
Comment 20 Bruno 2013-02-15 09:54:04 UTC
Created attachment 74862 [details]
Kernel config + compile log for 3863c9bc887e9638a9d905d55f6038641ece78d6

This includes the build log for rebuilding (trying to) kernel after touching all files under drivers/gpu/drm/nouveau.

The kernel config preceeds the build log in the same file.

For the whole range of commits where compilation fails the errors are the same though I did not take extra care to check if it was always failing for the same source files.
Comment 21 Marcin Slusarz 2013-02-16 12:10:48 UTC
Created attachment 74935 [details] [review]
compilation fix

I'm not sure why I can't hit it, but this patch will probably resolve your compilation problem...
Comment 22 Bruno 2013-02-17 10:06:46 UTC
(In reply to comment #21)
> Created attachment 74935 [details] [review] [review]
> compilation fix
> 
> I'm not sure why I can't hit it, but this patch will probably resolve your
> compilation problem...

The patch does not fix compilation for me.

Are you building in-tree or out-of-tree?
I'm building out-of-tree.
Comment 23 Marcin Slusarz 2013-02-17 11:25:05 UTC
in-tree
Comment 24 Bruno 2013-02-17 17:44:05 UTC
It seems that there is a fix for this issue in nouveau/master tough that kernel stalls on click to open menu in enlightenment 0.17.1 [compositing enabled]...

And once the stall has happened, the GPU seems rather confused, OSX wont successfully start anymore and older Linux kernels show incorrectly tiled content with nouveau (nouveaufb) while EFIFB gets things displayed as usual.
Comment 25 Marcin Slusarz 2013-02-17 18:16:42 UTC
Note that one recent commit in xf86-video-nouveau (http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=912d418fdfd2e99eef1e5c631c76dda1d82cf451) may reduce visibility of this bug... Did you update xf86-video-nouveau too?
Comment 26 Bruno 2013-02-18 19:44:17 UTC
(In reply to comment #23)
> in-tree

Yes, compiling in-tree does work... so includes search paths somehow are broken between those two revisions I mentioned in comment #18 when building out-of-tree. Not so good.

(In reply to comment #25)
> Note that one recent commit in xf86-video-nouveau
> (912d418fdfd2e99eef1e5c631c76dda1d82cf451) may reduce visibility of this
> bug... Did you update xf86-video-nouveau too?

No, I just updated kernel. For userspace I'm still on xf86-video-nouveau-1.0.4, libdrm-2.4.40, mesa-9.0.1 and xorg-server-1.13.1
The nouveau/master commit I tried was 43b629c047... (drm/nouveau: Fix DPMS 1 on G4 Snowball, from snow white to coal black) by Stefan de Konink.

(In reply to comment #16)
> Just to be sure we are not chasing two different bugs - can you confirm
> 3863c9bc887e9638a9d905d55f6038641ece78d6 is the commit which introduced
> *this* bug?

Bug present in 3863c9bc887e9638a9d905d55f6038641ece78d6.
Bug not present in preceding 8a9b889e668a5bc2f4031015fe4893005c43403d.

So yes, I can confirm.
Comment 27 Ankur 2013-03-29 00:25:04 UTC
Bug is still present with linux-3.8.4, xf86-video-nouveau-1.0.7
Comment 28 Ankur 2013-03-29 08:55:55 UTC
Sorry, last comment should read bug is *no longer* present for me with latest kernel and drivers.
Comment 29 David Herrmann 2013-03-29 18:42:59 UTC
I cannot reproduce it, either. linux-3.8.4 + libdrm-2.4.43 but I don't know what fixed it.
Feel free to mark as fixed.
Comment 30 Mourad De Clerck 2013-03-29 21:32:53 UTC
Well, with my MCP89 on 3.8.x, I still get:

trapped read at 0x000040fdd0 on channel 0x0000fee0 BAR/PFIFO_READ/FB reason: VRAM_LIMIT

On 3.9-rc4 it's even worse: complete screen corruption and hard lock.

Is mine a different bug?
Comment 31 Bruno 2013-03-29 21:56:01 UTC
I do get lock-ups with 3.8.4 and 3.9-rc4.

3.9-rc4 does not have the corrupted cursor anymore but still has lots of trapped reads or writes.

Kernel is still alive at lockup time and kernel log reports
...
[  521.627844] nouveau E[     PFB][0000:02:00.0] trapped read at 0x000040fe2c on channel 0x0000fee0 [unknown] BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT
[  521.627864] nouveau E[     PFB][0000:02:00.0] trapped read at 0x000040fe34 on channel 0x0000fee0 [unknown] BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT
[  521.627882] nouveau E[     PFB][0000:02:00.0] trapped read at 0x000040fe3c on channel 0x0000fee0 [unknown] BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT
[  521.630177] nouveau E[     PFB][0000:02:00.0] trapped write at 0x00004025c0 on channel 0x0000fee0 [unknown] BAR/PFIFO_WRITE/FB reason: PAGE_NOT_PRESENT
[  521.632255] nouveau E[   PFIFO][0000:02:00.0] still angry after 101 spins, halt
[  521.632273] nouveau E[     PFB][0000:02:00.0] trapped read at 0x00004025c0 on channel 0x0000fee0 [unknown] BAR/PFIFO_READ/FB reason: PAGE_NOT_PRESENT
[  524.640022] [sched_delayed] sched: RT throttling activated
Comment 32 alzeih 2013-05-07 19:20:36 UTC
Still broken for me on kernel 3.8.11-1 and xf86-video-nouveau 1.0.7-1 with the GeForce GT 330M.
Comment 33 Bruno 2013-05-07 20:23:43 UTC
3.9.0 with xf86-video-nouveau-1.0.7, libdrm-2.4.44 and mesa-9.1.2 seems to behave properly (no broken cursor and no log spamming).

Not sure which part of the updates did the trick.

There is still seldom complaints showing up, e.g. on VT switch. Will attach that log later on hopefully with some indication of which action triggers which messages.
Comment 34 alzeih 2013-05-07 20:51:40 UTC
(In reply to comment #33)
> 3.9.0 with xf86-video-nouveau-1.0.7, libdrm-2.4.44 and mesa-9.1.2 seems to
> behave properly (no broken cursor and no log spamming).
> 
> Not sure which part of the updates did the trick.
> 
> There is still seldom complaints showing up, e.g. on VT switch. Will attach
> that log later on hopefully with some indication of which action triggers
> which messages.

Ah. Well.

Yes, I'm not getting log spamming or a broken mouse cursor with 3.9 or 3.8.11, but the corruption on the console (pre X) that started at the same time still persists. 

Perhaps not the same issue, but I'd be fairly certain it's related closely somehow. So I guess this ticket can be closed if there's another one open for the console graphics corruption?
Comment 35 Ilia Mirkin 2013-08-31 05:49:42 UTC
Closing as fixed per the comments. There are several NVAF-related bugs currently open, feel free to subscribe to them. (I've requested retests on them, although haven't heard back yet.)