Bug 33887 - nouveau causes graphic corruption were you cant do anything
Summary: nouveau causes graphic corruption were you cant do anything
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: 7.6 (2010.12)
Hardware: x86 (IA32) Linux (All)
: medium blocker
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-03 19:15 UTC by zeruke
Modified: 2011-04-30 14:29 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
snapshot of graphic coruption (299.74 KB, image/jpeg)
2011-02-03 19:15 UTC, zeruke
no flags Details
possible fix for nv4x/nv6x chipsets (1.31 KB, patch)
2011-02-06 14:29 UTC, Ben Skeggs
no flags Details | Splinter Review
kernel-38-rc4 with nouveau.noaccel=1 (628.61 KB, image/jpeg)
2011-02-09 07:37 UTC, Ronald
no flags Details
kernel-38-rc4 without nouveau.noaccel=1 (702.01 KB, image/jpeg)
2011-02-09 07:39 UTC, Ronald
no flags Details
after patch with res at 1280x800(16:10) (176.08 KB, image/jpeg)
2011-02-10 23:53 UTC, zeruke
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description zeruke 2011-02-03 19:15:58 UTC
Created attachment 42914 [details]
snapshot of graphic coruption

with ubuntu on an hp pavilion dv6605us(dv6500) and a GeForce 7150m / nForce 630M built in graphics theres so bad of graphic corruption where you can only see the mouse correctly which is shown in the snapshot attachment

the only way to get past it is by disabling and/or completely removing nouveau


i have made a report to ubuntu's launchpad https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/711591
Comment 1 berbae 2011-02-04 05:59:12 UTC
I have the same display corruption with :

nVidia Corporation C73 [GeForce 7100 / nForce 630i] (rev a2)

Linux arch64 2.6.37-ARCH #1 SMP PREEMPT Sat Jan 29 20:00:33 CET 2011 x86_64 Pentium(R) Dual-Core CPU E5200 @ 2.50GHz GenuineIntel GNU/Linux

In the Arch Linux forum thread :
https://bbs.archlinux.org/viewtopic.php?id=112758

other persons mentioned the same problem. 
It seems to concern NV3x and NV4x chipsets.

It happens with the upgrades :

libdrm 2.4.22 -> 2.4.23
libgl 7.9.0.git20101207 -> 7.10
libva 1.0.6 -> 1.0.8
mesa 7.9.0.git20101207 -> 7.10
xf86-video-nouveau 0.0.16_git20100819 -> 0.0.16_git20101217

See also the bug report in the Arch Linux flyspray :
https://bugs.archlinux.org/task/22700?project=1

Everything seems normal in the log files, and the processes are running ok, X server and the WM processes are started normally.
Only the mouse cursor works and the display is totally unusable as shown in the screenshots.

Downgrading to the previous release version restore the display as normal.

Please look into this horrible regression to fix it, because the recent release of the nouveau driver is totally unusable.
Comment 2 Tomasz Wasiak 2011-02-04 22:53:50 UTC
This is same corruption I am having on GeForce 6100/nForce 430 (bug #33688).

Try using NoAccel or ShadowFB option in xorg.conf and check if that helps...
Comment 3 Tomasz Wasiak 2011-02-04 23:00:44 UTC
(In reply to comment #2)
> This is same corruption I am having on GeForce 6100/nForce 430 (bug #33688).
> 
> Try using NoAccel or ShadowFB option in xorg.conf and check if that helps...

I do not know why BugZilla refers to wrong bug (I was writing about https://bugs.freedesktop.org/show_bug.cgi?id=33668)...
Comment 4 zeruke 2011-02-05 01:32:21 UTC
@Tomasz Wasiak

both of those options don't work but i did find a thing in tty1 stating

[    8.520202] [drm] nouveau 0000:00:12.0: ======= misaligned reg 0x001020FB =======
[    8.520217] [drm] nouveau 0000:00:12.0: ======= misaligned reg 0x001020FB =======
Comment 5 Maarten Maathuis 2011-02-05 05:26:01 UTC
My guess is that something is wrong with tiled scanout. This was enabled with commit http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=c88f13e25b0040c1dd0f93e0ac40f62a6005ce59

You're not the first to complain, and it's maybe a good to gather the "real" names of problem cards.

In my log i have:
[drm] nouveau 0000:01:00.0: Detected an NV50 generation card (0x096000c1)

You should have something with NV40 or NV44 generation card, the 96 in my case means that i have an NV96.
Comment 6 Xavier 2011-02-05 05:46:57 UTC
List of cards reported so far, I looked up for codenames on http://nouveau.freedesktop.org/wiki/CodeNames

GeForce 7150m / nForce 630M - nv67
GeForce 7100 / nForce 630i - nv67?
GeForce 6100 / nForce 430 - nv4e
GeForce 6150SE nForce 430 - nv4c
GeForce FX Go5700 - nv36
Comment 7 berbae 2011-02-05 06:13:02 UTC
For the GeForce 7100 / nForce 630i I have in the log :

[drm] nouveau 0000:00:10.0: Detected an NV40 generation card (0x063000a2)

From http://en.wikipedia.org/wiki/GeForce_7_Series :

"The 7100 series was introduced on August 30, 2006 and is based on GeForce 6200 Series architecture."
and
"it is little more than a revamped version of the GeForce 6200TC"

I presume that's why it is considered as NV40 chipset.
Comment 8 berbae 2011-02-05 06:30:58 UTC
Sorry I just noticed in the list from http://nouveau.freedesktop.org/wiki/CodeNames :

NV63    GeForce 7100 / nForce 630i

In dmesg I have :

[drm] nouveau 0000:00:10.0: Detected an NV40 generation card (0x063000a2)

But in Xorg.0.log I have effectively :

[   133.128] (--) NOUVEAU(0): Chipset: "NVIDIA NV63"

and also :

[   133.472] (II) NOUVEAU(0): [XvMC] Associated with NV40 texture adapter.

So it is not clear to me what chipset it is.
Comment 9 Maarten Maathuis 2011-02-05 06:32:04 UTC
NV6X is just because there were no numbers left in NV4X :)
Comment 10 Maarten Maathuis 2011-02-05 06:43:20 UTC
I think a mmio trace (http://nouveau.freedesktop.org/wiki/MmioTrace) of all the problematic cards running the closed source driver should shed some light on what is wrong with the tiling code on these cards. Because i suspect the blob uses a tiled frontbuffer too.

You can send them to the email address mentioned at the bottom of the wiki page.

Even though I'm not the best person to look at this (i don't use that generation of hardware anymore for example), I'll do what i can though if noone steps up.
Comment 11 zeruke 2011-02-05 13:18:39 UTC
@ Maarten Maathuis

i would do the mmio trace if i could but right now until nvidia updates the beta driver to support the new xorg stuff then i wont be able to use the closed source drivers unless i downgrade which at the moment i don't really want to do
Comment 12 Xavier 2011-02-05 13:43:41 UTC
(In reply to comment #11)
> @ Maarten Maathuis
> 
> i would do the mmio trace if i could but right now until nvidia updates the
> beta driver to support the new xorg stuff then i wont be able to use the closed
> source drivers unless i downgrade which at the moment i don't really want to do

Which xorg version and which nvidia version are you using ?
http://nouveau.freedesktop.org/wiki/BlobVersions
Comment 13 zeruke 2011-02-05 14:07:47 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > @ Maarten Maathuis
> > 
> > i would do the mmio trace if i could but right now until nvidia updates the
> > beta driver to support the new xorg stuff then i wont be able to use the closed
> > source drivers unless i downgrade which at the moment i don't really want to do
> 
> Which xorg version and which nvidia version are you using ?
> http://nouveau.freedesktop.org/wiki/BlobVersions

right now because how im using ubuntu 11.04 alpha2 the xserver is 1.9.99.901+git20110131.be3be768-0ubuntu3 which is seen as xserver 1.10 and nvidia only had preliminary support with the 270.18 which 270.18 has a problem with the ABI right now and if i set it to ignoreABI it then i get segfaults which is a known problem which should be fixed in the next release 

so right now im not running nvidia drivers so im using the basic xorg graphics because i have to modeset=0 to nouveau so i can see things correctly
Comment 14 berbae 2011-02-06 03:24:26 UTC
Isn't it possible to compare previous release and last one to list the changes made and see what patches or changes could have caused the regression ?
Isn't it possible to reverse some changes to previous state ?
Comment 15 Xavier 2011-02-06 04:12:23 UTC
You are not listening, we already know what commit broke it :
http://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=c88f13e25b0040c1dd0f93e0ac40f62a6005ce59

Now we want to know why tiled scanout does not work with these nforce boards, and we need a mmiotrace for that.
Comment 16 Tomasz Wasiak 2011-02-06 06:29:48 UTC
Unfortunately mentioned commit is not the only issue.
Using revision 38e8809bb415bae5c182fc79c8fc62992c5e4ed0 patched not to use tiled scanout helps only a bit when using current master branch of mesa... 
You need to switch to mesa-7.9 branch in order to have X working normally without major screen corruption (unfortunately there are still some minor corruptions here and there but you can live with them...).
Still I got only 2D acceleration working (I know 3D is not supported :-D) - screen (or window) is totally messed up even when launching glxgears demo using Gallium3D nouveau driver.

I had been trying nearly all revisions of xf86-video-nouveau (from 4063616938f76af8028491276039d422c0782b1b dated April 9th 2010 till current) built on top of current master branch of mesa with same major screen corruption!
Of course most of them need some patches not to lock the GPU when built on top of current versions of libdrm/mesa/xorg-server but I have carefully checked if those patches could be source of screen corruption issues.
Comment 17 berbae 2011-02-06 12:37:54 UTC
I read the mmiotrace.txt file on how to use the kernel functionality.
Can you tell me what actions exactly would be useful to be traced after the WM is started.

It is written :
"During tracing you can place comments (markers) into the trace by
$ echo "X is up" > /sys/kernel/debug/tracing/trace_marker
This makes it easier to see which part of the (huge) trace corresponds to
which action. It is recommended to place descriptive markers about what you
do."
But what actions exactly to do during the trace process ?

And :
"Please, pack into a compressed archive the trace file and a free description about what you do during the trace."
Again what is useful to do ?

Can you also precise to me the format of the name of the archive file.
"The name of the archive file should contain the PCI id and GPU family, or the commercial name of your card."
Can you give an example of name please ?

Again :
"If you are doing a trace for a driver project, e.g. Nouveau, you should also
do the following before sending your results:
$ lspci -vvv > lspci.txt
$ dmesg > dmesg.txt
$ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
and then send the .tar.gz file. The trace compresses considerably. Replace
"pciid" and "nick" with the PCI ID or model name of your piece of hardware
under investigation and your nickname."
I would like an example of name of the tarball file.
Comment 18 Xavier 2011-02-06 13:36:21 UTC
(In reply to comment #17)
> I read the mmiotrace.txt file on how to use the kernel functionality.
> Can you tell me what actions exactly would be useful to be traced after the WM
> is started.
> 
> It is written :
> "During tracing you can place comments (markers) into the trace by
> $ echo "X is up" > /sys/kernel/debug/tracing/trace_marker
> This makes it easier to see which part of the (huge) trace corresponds to
> which action. It is recommended to place descriptive markers about what you
> do."
> But what actions exactly to do during the trace process ?
> 
> And :
> "Please, pack into a compressed archive the trace file and a free description
> about what you do during the trace."
> Again what is useful to do ?
> 

AFAIK with nouveau, you get corruption just by starting X.
So I think you just need to start X with the blob, mark X is up in the trace, and stop.

> Can you also precise to me the format of the name of the archive file.
> "The name of the archive file should contain the PCI id and GPU family, or the
> commercial name of your card."
> Can you give an example of name please ?
> 
> Again :
> "If you are doing a trace for a driver project, e.g. Nouveau, you should also
> do the following before sending your results:
> $ lspci -vvv > lspci.txt
> $ dmesg > dmesg.txt
> $ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
> and then send the .tar.gz file. The trace compresses considerably. Replace
> "pciid" and "nick" with the PCI ID or model name of your piece of hardware
> under investigation and your nickname."
> I would like an example of name of the tarball file.

$ lspci -n -d 10de:
01:00.0 0300: 10de:0407 (rev a1)

-> the pci id of my card is 0407 (10de is vendor id, nvidia)

$ dmesg | grep generation
[11562.063550] [drm] nouveau 0000:01:00.0: Detected an NV50 generation card (0x084700a2)

-> generation is nv50, codename nv84.

So in my case I would just call it nv84-0407-shining-mmiotrace.tar.gz
Comment 19 Ben Skeggs 2011-02-06 14:29:34 UTC
Created attachment 43011 [details] [review]
possible fix for nv4x/nv6x chipsets

I don't know these cards as well as curro, but, we do this wrong on at least NV67, quite possible some others too.  Can anyone on nv4x experiencing this give this patch a shot?
Comment 20 zeruke 2011-02-06 15:45:25 UTC
(In reply to comment #19)
> Created an attachment (id=43011) [details]
> possible fix for nv4x/nv6x chipsets
> 
> I don't know these cards as well as curro, but, we do this wrong on at least
> NV67, quite possible some others too.  Can anyone on nv4x experiencing this
> give this patch a shot?

i tried but im guessing im doing something wrong because i get this when trying to patch

patching file nv40_graph.c
Hunk #1 FAILED at 223.
Hunk #2 FAILED at 230.
Hunk #3 FAILED at 239.
3 out of 3 hunks FAILED -- saving rejects to file nv40_graph.c.rej
Comment 21 zeruke 2011-02-06 20:14:23 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > Created an attachment (id=43011) [details] [details]
> > possible fix for nv4x/nv6x chipsets
> > 
> > I don't know these cards as well as curro, but, we do this wrong on at least
> > NV67, quite possible some others too.  Can anyone on nv4x experiencing this
> > give this patch a shot?
> 
> i tried but im guessing im doing something wrong because i get this when trying
> to patch
> 
> patching file nv40_graph.c
> Hunk #1 FAILED at 223.
> Hunk #2 FAILED at 230.
> Hunk #3 FAILED at 239.
> 3 out of 3 hunks FAILED -- saving rejects to file nv40_graph.c.rej

i found out what i did wrong and about to go through the whole set-up in a bit
Comment 22 zeruke 2011-02-06 21:20:53 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > Created an attachment (id=43011) [details] [details] [details]
> > > possible fix for nv4x/nv6x chipsets
> > > 
> > > I don't know these cards as well as curro, but, we do this wrong on at least
> > > NV67, quite possible some others too.  Can anyone on nv4x experiencing this
> > > give this patch a shot?
> > 
> > i tried but im guessing im doing something wrong because i get this when trying
> > to patch
> > 
> > patching file nv40_graph.c
> > Hunk #1 FAILED at 223.
> > Hunk #2 FAILED at 230.
> > Hunk #3 FAILED at 239.
> > 3 out of 3 hunks FAILED -- saving rejects to file nv40_graph.c.rej
> 
> i found out what i did wrong and about to go through the whole set-up in a bit
 and i cant seem to get anything to work how it should no matter where i get the instructions and such
Comment 23 Bozhan Boyadzhiev 2011-02-07 02:29:35 UTC
I have same problems!

My video card is:

00:0d.0 VGA compatible controller: nVidia Corporation C61 [GeForce 6100 nForce 405] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: ASRock Incorporation Device 03d1
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 20
        Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at dd000000 (64-bit, non-prefetchable) [size=16M]
        Expansion ROM at dfcc0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nouveau
        Kernel modules: nouveau, nvidiafb

As i understand developers needs mmiotrace but did i have to make it with nvidia module? We can't use it right now because of broken ABI??!?
Comment 24 Maarten Maathuis 2011-02-07 02:36:46 UTC
Nvidia only supports released xservers. zeruke is using an alpha ubuntu with a prerelease xserver. So if you have a normal release you should be fine.
Comment 25 Xavier 2011-02-07 02:41:34 UTC
comment 19 provides a patch, so forget about the mmiotrace, just try the patch.

But you need to be able to build a kernel from source, probably from git and apply the patch there.
http://nouveau.freedesktop.org/wiki/InstallDRM
Comment 26 Ronald 2011-02-07 21:12:49 UTC
dmesg | grep generation
[drm] nouveau 0000:01:00.0: Detected an NV50 generation card (0x0a3180a2)


I boot with nouveau.noaccel=1 as mentioned in $INTERNET,but still see sometimes some corruptions moving windows. moving it out the screen and back helps.

kernels 35,37,38
Comment 27 Ben Skeggs 2011-02-07 21:23:44 UTC
(In reply to comment #26)
> dmesg | grep generation
> [drm] nouveau 0000:01:00.0: Detected an NV50 generation card (0x0a3180a2)
> 
> 
> I boot with nouveau.noaccel=1 as mentioned in $INTERNET,but still see sometimes
> some corruptions moving windows. moving it out the screen and back helps.
> 
> kernels 35,37,38

I *highly* doubt the bug you're seeing is the same bug.  Plus, if you're seeing corruption with noaccel, it's likely not nouveau's fault at all either.
Comment 28 Ronald 2011-02-09 07:30:40 UTC
aha okay.

yust checked: with distro kernel it's gone.

with 38-rc4 I will attache 2 photo's


tell me if should open a new or attach them to an already open bug.
Comment 29 Ronald 2011-02-09 07:37:06 UTC
Created attachment 43162 [details]
kernel-38-rc4 with nouveau.noaccel=1

kernel-38-rc4 with nouveau.noaccel=1
Comment 30 Ronald 2011-02-09 07:39:57 UTC
Created attachment 43163 [details]
kernel-38-rc4 without nouveau.noaccel=1

kernel-38-rc4 *without* nouveau.noaccel=1

same with kernel 37
only hard reset works
Comment 31 zeruke 2011-02-09 08:19:57 UTC
well i still cant check on the patch because for some reason I cant build it....i get errors like the kernel tree is wrong and if not that i get something about files being unexpected or expected somewhere.....it might be because im using ubuntu but im not sure...or maybe im just missing a step but i am using the instructions at http://nouveau.freedesktop.org/wiki/InstallDRM

maybe i can get one already compiled by someone?....and im using the latest rc of kernel 2.6.38 with the ubuntu's patch on it..
Comment 32 zeruke 2011-02-10 23:53:46 UTC
Created attachment 43234 [details]
after patch with res at 1280x800(16:10)

the patch fixes it were now i can see for the most part but it doesnt fix what was going on befor the compleate blocking of the screen

still has a small tiling of about 3 or so when resolution is at 1280x800(16:10) which i believe is my screens native resolution....that tiling is fixed by lowering the resolution which i now have at 1024x768(4:3)

it seems to have the small tiling when the aspect ratio is at (16:10) and (9:5), all resolutions using the aspect ratio (4:3) shows things perfectly when using the patch
Comment 33 Andy Whitcroft 2011-02-11 02:43:11 UTC
Note that Ubuntu 2.6.38-rc4 based kernels are available with the "possible fix for nv4x/nv6x chipsets" patch applied are available, see the downstream bug for details:

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/711591/comments/24
Comment 34 Jakub Wilk 2011-02-14 10:17:07 UTC
I was experiencing a similar problem with GeForce 6150SE nForce 430. After rebuilding my kernel with https://bugs.freedesktop.org/attachment.cgi?id=43011 applied, and the problem went way.

More details in the downstream bug report: http://bugs.debian.org/613078
Comment 35 Andy Whitcroft 2011-02-22 03:05:53 UTC
We have a couple of reports back on the downstream bug with the "possible fix
for nv4x/nv6x chipsets" patch applied.  One report of complete mitigation, another which sounds like there is a second issue but they are improved by the patch.  Comments #25 and #26 below:

    https://bugs.launchpad.net/ubuntu/+source/linux/+bug/711591
Comment 36 Maarten Maathuis 2011-02-22 03:12:00 UTC
That patch was never meant to be tested by everyone. It was a hack to test for people who have problems.

Ben committed this 5 days ago. It seems to be in linus' tree too.

http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?id=aaa3d08c357dcfbe13ec23786c294759183a4d8d


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.