Bug 27706 - Blank screen and "Error referencing VRAM ctxdma: -12" on NV44A with ubuntu lucid beta 2 (worked in alpha 3)
Summary: Blank screen and "Error referencing VRAM ctxdma: -12" on NV44A with ubuntu lu...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: 7.5 (2009.10)
Hardware: x86 (IA32) Linux (All)
: medium critical
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-16 13:04 UTC by mossroy
Modified: 2010-04-17 09:44 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Dmesg on ubuntu lucid beta 2 installed (does not work) (40.83 KB, text/plain)
2010-04-16 13:04 UTC, mossroy
no flags Details
Dmesg on ubuntu lucid alpha 3 live CD (works) (44.32 KB, text/plain)
2010-04-16 13:05 UTC, mossroy
no flags Details
Dmesg on ubuntu lucid beta 2 installed, with nouveau.noaccel=1 (works) (40.12 KB, text/plain)
2010-04-16 13:06 UTC, mossroy
no flags Details
Photo of grey lines that appear with option "video=vga16fb:off" (702.19 KB, image/jpeg)
2010-04-16 15:15 UTC, mossroy
no flags Details
Dmesg on ubuntu lucid beta 2 installed, with video=vga16fb:off (39.78 KB, text/plain)
2010-04-16 15:16 UTC, mossroy
no flags Details
kern.log on ubuntu lucid beta 2 installed, with video=vga16fb:off (61.72 KB, text/plain)
2010-04-16 15:17 UTC, mossroy
no flags Details
Dmesg on ubuntu lucid beta 2 installed, with updated PPA packages (works) (40.25 KB, text/plain)
2010-04-17 09:44 UTC, mossroy
no flags Details

Description mossroy 2010-04-16 13:04:51 UTC
Created attachment 35110 [details]
Dmesg on ubuntu lucid beta 2 installed (does not work)

I can't start the ubuntu lucid beta 2 liveCD on a computer with a Nvidia GeForce 6200 (NV44A rev A1) : the screen stays blank after booting.
It works properly with a ubuntu lucid alpha 3 liveCD (which probably uses a previous version of nouveau) and ubuntu karmic liveCD (which uses nv driver)

I managed to install the beta 2 on this computer through the alternate CD : the behavior is the same when booting on the hard drive.
I installed all the updates (as of 16th april 2040) : same behavior

I experience the precise same behavior with Fedora live : Fedora 12 live boots correctly (but relies on "nv" driver), Fedora 13 beta live gives the same blank screen (and also uses "nouveau" driver).

I think this is a regression in the "nouveau" graphic driver : the version bundled with lucid alpha 3 works, the one with lucid beta 2 (or fedora 13 beta) does not.
Here is an excerpt of the dmesg on beta 2 :
  nouveau 0000:03:00.0: RAMHT space exhausted. ch=0
  nouveau 0000:03:00.0: Error referencing VRAM ctxdma: -12
  nouveau 0000:03:00.0: gpuobj -12

This error message does not appear when using lucid alpha 3.

I found that adding "nouveau.noaccel=1" as a boot parameter is a workaround, both on ubuntu lucid beta 2 and on fedora 13 beta

My motherboard is a Asus A7N8X-E deluxe, on which a GeForce 6200 video card (branded MSI) is in the AGP port.

You will find attached the dmesg of :
- ubuntu lucid beta 2 installed (does not work)
- ubuntu lucid alpha 3 liveCD (works)
- ubuntu lucid beta 2 installed, with nouveau.noaccel=1 (works)

lspci gives :
03:00.0 VGA compatible controller: nVidia Corporation NV44A [Geforce 6200] (rev a1)

I first opened this bug on ubuntu launchpad : https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/564617
Comment 1 mossroy 2010-04-16 13:05:40 UTC
Created attachment 35111 [details]
Dmesg on ubuntu lucid alpha 3 live CD (works)
Comment 2 mossroy 2010-04-16 13:06:34 UTC
Created attachment 35112 [details]
Dmesg on ubuntu lucid beta 2 installed, with nouveau.noaccel=1 (works)
Comment 3 mossroy 2010-04-16 13:35:21 UTC
Here are the different versions of libraries used by ubuntu (I don't know which ones are relevant in this case) :

On beta 2 (with all current updates) :
xorg 1:7.5+5ubuntu1
libdrmnouveau1 2.4.18-1ubuntu3
xserver-xorg-video-nouveau 1:0.0.15+git20100219+9b4118d-0ubuntu5
kernel 2.6.32-21-generic (on x86)

On alpha 3 liveCD :
xorg 1:7.5+1ubuntu8
libdrmnouveau1 2.4.18-1ubuntu2
xserver-xorg-video-nouveau 1:0.0.15+git20100219+9b4118d-0ubuntu2
kernel 2.6.32-14-generic (on x86)
Comment 4 Xavier 2010-04-16 14:42:47 UTC
Ok first, I noticed these strange differences :
dmesg-beta2:[    6.686828] Linux agpgart interface v0.103
dmesg-beta2:[    6.709693] agpgart: Detected NVIDIA nForce2 chipset
dmesg-beta2:[    6.725012] agpgart-nvidia 0000:00:00.0: AGP aperture is 64M @ 0xe0000000
dmesg-beta2:[    8.814073] agpgart-nvidia 0000:00:00.0: AGP 3.0 bridge
dmesg-beta2:[    8.814090] agpgart: modprobe tried to set rate=x12. Setting to AGP3 x8 mode.
dmesg-beta2:[    8.814097] agpgart-nvidia 0000:00:00.0: putting AGP V3 device into 8x mode
dmesg-alpha3:[    4.397083] Linux agpgart interface v0.103
dmesg-alpha3:[    5.005571] agpgart: Detected NVIDIA nForce2 chipset
dmesg-alpha3:[    5.012047] agpgart-nvidia 0000:00:00.0: AGP aperture is 64M @ 0xe0000000

It might be worth to understand why that changed first, before looking at the rest. Unless someone tells you otherwise :)

(In reply to comment #3)
> Here are the different versions of libraries used by ubuntu (I don't know which
> ones are relevant in this case) :
> 
> On beta 2 (with all current updates) :
> xorg 1:7.5+5ubuntu1
> libdrmnouveau1 2.4.18-1ubuntu3
> xserver-xorg-video-nouveau 1:0.0.15+git20100219+9b4118d-0ubuntu5
> kernel 2.6.32-21-generic (on x86)
> 
> On alpha 3 liveCD :
> xorg 1:7.5+1ubuntu8
> libdrmnouveau1 2.4.18-1ubuntu2
> xserver-xorg-video-nouveau 1:0.0.15+git20100219+9b4118d-0ubuntu2
> kernel 2.6.32-14-generic (on x86)

I suppose you can exclude xorg and xorg nouveau driver, since you apparently get that problem right when nouveau initializes, before touching X.

There remains libdrm nouveau and the kernel. The problem is that these are ubuntu packages, and to people external to ubuntu (like all nouveau developers, and many users, like me), it's not easy to know exactly what code that is using.

If you could point out the equivalent upstream code, it would help.
For libdrm : http://cgit.freedesktop.org/mesa/drm/
For drm/ttm : http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=shortlog;h=drm-next
For nouveau drm : http://cgit.freedesktop.org/nouveau/linux-2.6/

More importantly, how do we know that this latest upstream code would not work for you ?
Comment 5 Marcin Slusarz 2010-04-16 14:48:11 UTC
please try video=vga16fb:off kernel parameter and attach kernel log
Comment 6 mossroy 2010-04-16 15:10:55 UTC
Thanks Xavier and Marcin for your quick answers.

Xavier, I had noticed the extra lines regarding AGP x8 (see https://bugs.launchpad.net/nouveau/+bug/564617/comments/3).
What is sure is that I did not touch anything inside the PC, or in the BIOS.

Is there an easy way to test with an upstream version of nouveau? I might run a liveCD, or even install another OS if it's really necessary.

Marcin,
I tried the "video=vga16fb:off" parameter : I can see the ubuntu logo, but then I only see grey lines on the screen (see attached photo)
Then, the PC seems frozen : I can only do a hard power-off
You'll find the dmesg and corresponding kern.log attached
Comment 7 mossroy 2010-04-16 15:15:38 UTC
Created attachment 35118 [details]
Photo of grey lines that appear with option "video=vga16fb:off"
Comment 8 mossroy 2010-04-16 15:16:30 UTC
Created attachment 35120 [details]
Dmesg on ubuntu lucid beta 2 installed, with video=vga16fb:off
Comment 9 mossroy 2010-04-16 15:17:11 UTC
Created attachment 35123 [details]
kern.log on ubuntu lucid beta 2 installed, with video=vga16fb:off
Comment 10 mossroy 2010-04-16 15:21:08 UTC
Maybe I should add that my screen is plugged on the DVI interface, not the VGA one
The AGP x8 is enabled in the BIOS
Comment 11 mossroy 2010-04-16 15:31:56 UTC
Regarding the version number of libdrmnouveau1, normally ubuntu uses a prefix for the version upstream, and a suffix for the patches it applies on it.
So, I suppose it is based on version 2.4.18 ( http://cgit.freedesktop.org/mesa/drm/tag/?id=2.4.18 )
The changelog of ubuntu patches are there : http://changelogs.ubuntu.com/changelogs/pool/main/libd/libdrm/libdrm_2.4.18-1ubuntu3/changelog
The difference between version "ubuntu2" and "ubuntu3" seems to be the correction of this bug : https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/547124 , that did not apparently affect me, and that corresponds to the following commit : http://cgit.freedesktop.org/mesa/drm/commit/?id=df32c307e8f81b46ee8aa4dd7222fc18f175bbb3
Not sure if it's really relevant in our case.

Maybe version alpha3 of ubuntu had the option "nouveau.noaccel=1" by default, and not in version beta2? Which would explain the difference of behavior even if the versions are very close.
Comment 12 mossroy 2010-04-16 15:41:54 UTC
Regarding the kernel used by ubuntu, the changelog is there : http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_2.6.32-21.32/changelog

There has been a few changes that consist in disabling acceleration for specific cards. See for example https://bugs.launchpad.net/ubuntu/+source/linux/+bug/544088
Well, disabling acceleration is a workaround in my opinion, not a real fix...
Comment 13 mossroy 2010-04-16 23:45:32 UTC
In any case, I don't think this could be due to ubuntu patches because the behavior is the same with fedora 13 beta

The versions used by fedora seem a bit more recent :
kernel 2.6.33.1-24-fc13.i686
xorg-x11-drv-nouveau 1:0.0.16-2.20100218git2964702.fc13
libdrm 2.4.19-1.fc13
Comment 14 Xavier 2010-04-17 02:48:09 UTC
(In reply to comment #13)
> In any case, I don't think this could be due to ubuntu patches because the
> behavior is the same with fedora 13 beta
> 
> The versions used by fedora seem a bit more recent :
> kernel 2.6.33.1-24-fc13.i686
> xorg-x11-drv-nouveau 1:0.0.16-2.20100218git2964702.fc13
> libdrm 2.4.19-1.fc13

Fedora has the advantage that its maintained and supported by a nouveau developer (actually the main one working on the kernel side). But you also should use the distrib bug tracker in that case.

Going back to ubuntu, it's not just a matter of what custom patches they have applied (though that's very important to know too), but also what changed between the working and broken version.
You said the main new changes was a quirk to disable accel ? How could that be the problem, since the new version only works when accel is disabled, and you need to do that manually ?
Anyway it seems the noaccel quirks do no affect your card, and should cause a message to be displayed when they are used.
http://people.canonical.com/~apw/raof-nv-accel-lucid/

Building latest code manually has several advantages :
- check if it has already been fixed there
- easy way for everyone to see what code you tried
- you can update whenever you want, and quickly apply patches if needed
- if it's indeed a regression, you can git bisect it to find the offending commit

Instructions are there :
http://nouveau.freedesktop.org/wiki/InstallNouveau
http://nouveau.freedesktop.org/wiki/InstallDRM

PS : a lot of what I said here is not specific to that bug, more a general concern about how nouveau bugs should be handled
Comment 15 Christopher James Halse Rogers 2010-04-17 04:06:19 UTC
So, the pertinent changes between Lucid Alpha 3 and Beta 2 would be:
Alpha 3 had the nouveau kernel module + drm + ttm from linux 2.6.33 + the ctxprog voodoo generator backported, in the out-of-tree linux-backports-modules.
Beta 2 has the drm stack from 2.6.33.2 + a number of fixes pulled in from nouveau/linux-2.6

If you're interested in bisecting, it should be quicker to start with 2.6.33.2 - if that works, then the problem is in one of the small number of backported nouveau patches we've got.  If it doesn't, it should be relatively quick to bisect from 2.6.33 to 2.6.33.2

If you're uncomfortable building this stack from source, the xorg-edgers PPA has recent snapshots, available here: https://edge.launchpad.net/~xorg-edgers/+archive/ppa

If that works, then this bug can be closed because it's already fixed.  If that doesn't work then you'll probably need to do some building from source in order to identify which change broke it.
Comment 16 Marcin Slusarz 2010-04-17 04:58:48 UTC
Before you start bisecting:
video=vga16fb:off does not work because in Ubuntu vga16fb is compiled as module and kernel parameters does not affect it - you have to blacklist it by hand.
There's "vga16fb: not registering due to another framebuffer present" (which is Ubuntu addon) in kernel log so it should not load it, BUT it exercises broken failure path in vga16fb which is iounmapping memory region which was not ioremapped by vga16fb (VGA_MAP_MEM is defined differently on different architectures - on x86 it's NOT ioremap), so it might affect nouveau - I need to investigate it more.

In reply to comment 13:
IIRC Fedora does not enable vga16fb, so:
- it might be diffent bug
- this bug is not related to vga16fb at all
Comment 17 mossroy 2010-04-17 09:33:59 UTC
Christopher, you were right.
After upgrading to the PPA xorg-edgers packages, the problem seems to be solved : I can boot successfully without changing any boot parameter.

So, it looks like the problem has already been solved in nouveau.
For further reference, the PPA version number of libdrm is 2.4.20+git20100404.c7650003-0ubuntu0sarvatt3 (instead of 2.4.18-1ubuntu3)
xserver-xorg-video-nouveau version is 1:0.0.15+git20100416.40636169-0ubuntu0sarvatt instead of 1:0.0.15+git20100219+9b4118d-0ubuntu5

The fix must be between these 2 versions.

Sorry for the inconvenience, and thank you to all who had a look in this issue.
Comment 18 mossroy 2010-04-17 09:44:36 UTC
Created attachment 35135 [details]
Dmesg on ubuntu lucid beta 2 installed, with updated PPA packages (works)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.