Bug 78441

Summary: [NV46] MSI doesn't work
Product: xorg Reporter: aebenjam
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
output from dmesg with buffer set to 1M
none
Xorg log
none
output from lspci -nn none

Description aebenjam 2014-05-08 13:52:11 UTC
Created attachment 98691 [details]
output from dmesg with buffer set to 1M

X starts just fine under 3.12.x for me, but fails under 3.13.x.  I waited for a few versions to see if the problem would be found/fixed by others, but we're now up to 3.13.11 (under Fedora 19) and it still fails.  Booting back into 3.12.11 works fine.

Forgive me if more information (beyond the kernel log and Xorg.0.log) would be helpful... I'm fairly green at working at this level of the O/S.  Please ask for whatever you need.  lspci shows the graphics card as "NVIDIA Corporation G86 [Quadro NVS 290] (rev a1)" and the kernel log repors "Chipset: G72 (NV46)", "Family : NV40".

Again, please let me know if I can provide more useful diagnostic info.

Oh, and at time index 638.685456 of the kernel log I had started switching to alt-consoles.  Until that point there were no additional log messages generated.

Thanks,

Adam Benjamin
Comment 1 aebenjam 2014-05-08 13:53:18 UTC
Created attachment 98692 [details]
Xorg log
Comment 2 Ilia Mirkin 2014-05-08 14:01:41 UTC
There does not appear to be any indication of a G86 card in your computer. PCI ID 10de:01d7 corresponds to G72 [Quadro NVS 110M / GeForce Go 7300] according to http://envytools.readthedocs.org/en/latest/pciid.html. And we read that it's a NV46 ( = G72) from the card's mmio register 0, further confirming that it's a G72...

Can you provide the output of "lspci -nn"?

The only relevant change between 3.12 and 3.13 that I can think of is that we added support for MSI. You can try disabling it by adding

nouveau.config=NvMSI=0

to your kernel commandline. If that doesn't help, you'll have to do a bisect.
Comment 3 aebenjam 2014-05-08 14:07:04 UTC
Created attachment 98693 [details]
output from lspci -nn
Comment 4 aebenjam 2014-05-08 14:09:09 UTC
Booting with nouveau.config=NvMSI=0 didn't change things.  No idea what a "bisect" is.  Please advise...

Thanks for the assistance.
Comment 5 Ilia Mirkin 2014-05-08 14:14:32 UTC
(In reply to comment #4)
> Booting with nouveau.config=NvMSI=0 didn't change things.  No idea what a
> "bisect" is.  Please advise...

git bisect. Grab a clone of the tree, and use git bisect between the good and bad tags (v3.12 and v3.13) to identify the offending commit. You can probably restrict the bisect to drivers/gpu/drm/nouveau.

IOW, you can run

git bisect start v3.13 v3.12 -- drivers/gpu/drm/nouveau

There are more detailed online guides that show how to use git bisect, and you may even find something that's specific to your distro if you're not familiar with building kernels.

Unrelated to this, your lspci shows no indication of a G86. Are you sure you saw that on the same computer?
Comment 6 aebenjam 2014-05-08 14:23:19 UTC
Gah, very sorry - the G86 is from my desktop.  (Read: I'm an idiot.)

The laptop, obviously, shows:  "01:00.0 VGA compatible controller: NVIDIA Corporation G72M [Quadro NVS 110M/GeForce Go 7300] (rev a1)"   Very sorry for the red herring...

I'll try to find time to try a git bisect, but my experience with compiling linux kernels directly dates back to 2.4.x, if not prior, and really I'm a bit lost at that level.  ie. I might spend a lot of time with very little to show for it.

I'm open to suggestions... and thanks again for the attention.
Comment 7 aebenjam 2014-05-23 19:34:41 UTC
Not sure that it's helpful, but I tried with the latest kernel (provided by Fedora 19 - 3.14.4-100.fc19.i686) and it still failed to start the X session.  I noticed, on the console while shutting down, that I was seeing messages:

E[Xorg[609]] failed to idle channel 0xcccc0001 [Xorg[609]]
E[Xorg[609]] failed to idle channel 0xcccc0000 [Xorg[609]] 

Not sure if that's useful to indicate what's going on.  It also took awhile for it to finally shut down.

But, flipped back to 3.12.11-201.fc19.i686 and I'm up and running fine.

Adam
Comment 8 Ilia Mirkin 2014-05-23 19:44:59 UTC
(In reply to comment #7)
> Not sure that it's helpful, but I tried with the latest kernel (provided by
> Fedora 19 - 3.14.4-100.fc19.i686) and it still failed to start the X
> session.  I noticed, on the console while shutting down, that I was seeing
> messages:
> 
> E[Xorg[609]] failed to idle channel 0xcccc0001 [Xorg[609]]
> E[Xorg[609]] failed to idle channel 0xcccc0000 [Xorg[609]] 

This is an indication that the GPU hung.

This is not the sort of thing that'll just fix itself... a bisect would probably identify the offending commit, which would probably be enough to figure out what the issue is. I scanned through all the nouveau commits between 3.12 and 3.13, and nothing jumped out as possibly causing this, but a lot of times there are completely unintended effects.
Comment 9 aebenjam 2014-05-30 19:43:40 UTC
So, I'm not sure about doing a bisect, but you can download the source RPMs for the two versions kernel-3.12.11-201.fc19 and kernel-3.13.4-100.fc19 which Red Hat released, here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=498402

and

http://koji.fedoraproject.org/koji/buildinfo?buildID=499637

You can download the .src.rpm easily and then diff the nouveau trees.  I can attach that diff if you like.

Thing is I'm not enough of a developer to (easily) take that any further.  I'm happy to help, but otherwise I'm an IT geek and out of my depth trying to work on video drivers.  Sorry.

Let me know if I can assist further.  Otherwise, for the moment, I'm stuck on a 3.12.x kernel.  :(

Adam
Comment 10 Ilia Mirkin 2014-05-30 19:47:42 UTC
(In reply to comment #9)
> So, I'm not sure about doing a bisect, but you can download the source RPMs
> for the two versions kernel-3.12.11-201.fc19 and kernel-3.13.4-100.fc19
> which Red Hat released, here:
> 
> http://koji.fedoraproject.org/koji/buildinfo?buildID=498402
> 
> and
> 
> http://koji.fedoraproject.org/koji/buildinfo?buildID=499637
> 
> You can download the .src.rpm easily and then diff the nouveau trees.  I can
> attach that diff if you like.

As something like "git diff v3.12..v3.13 -- drivers/gpu/drm/nouveau" would provide. We know what changed. We don't know which change broke things.

> 
> Thing is I'm not enough of a developer to (easily) take that any further. 
> I'm happy to help, but otherwise I'm an IT geek and out of my depth trying
> to work on video drivers.  Sorry.
> 
> Let me know if I can assist further.  Otherwise, for the moment, I'm stuck
> on a 3.12.x kernel.  :(

OK, if you ever want to get unstuck, figure out how to build your own kernels (loads of guides online), check out a git tree, and do a git bisect between v3.12 and v3.13 (again, tons of online guides).
Comment 11 aebenjam 2014-06-07 12:40:23 UTC
3c792a15ec799c27d634b102b605f3ec32c033c3 is the first bad commit
commit 3c792a15ec799c27d634b102b605f3ec32c033c3
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Fri Oct 11 14:56:39 2013 +1000

    drm/nouveau/mc: fetch NV_PMC_INTR again after re-arming MSI
    
    Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

:040000 040000 4ddfe2602412ff37217b60b8a28c7594db6e1cf2 206d1ce23e7d756a88db9da9965e70da9150ac0b M      drivers
Comment 12 Ilia Mirkin 2014-06-07 16:12:05 UTC
Very interesting. And you said that booting the latest kernel with

nouveau.config=NvMSI=0

didn't help? Can you provide a boot log from a (recent) kernel booted with MSI disabled as per above? Just want to make sure that the parameter is being applied properly.
Comment 13 aebenjam 2014-06-07 17:11:33 UTC
It's like the kernel hates me and is trying to make a liar of me.  I *know* I tried it before and it didn't work - but it's totally doing the job now.  Booted up just fine.  Only thing I can think of is that it's a slightly newer kernel since my previous test... although I'm guessing you're going to tell me nothing changed.  Anyway, confirmed as working properly under 3.14.4-100.fc19.i686 with nouveau.config=NvMSI=0

Thank you very much for your help.  What is it I've just disabled?

Adam
Comment 14 Ilia Mirkin 2014-06-07 17:26:21 UTC
http://en.wikipedia.org/wiki/Message_Signaled_Interrupts

It's a non-critical feature. Support for using MSI on nvidia cards was added in kernel 3.13, which was why I had suggested trying to disable it.

I guess the rearm register must be somewhere else, or something extra needs to be done...

OTOH, it's very interesting that you bisected it to that particular commit. It's not the commit that turns on MSI, but rather a commit that fixes MSI for some situations, making it more reliable. But apparently not for you...
Comment 15 aebenjam 2014-06-07 17:29:03 UTC
Okay, I'm glad the bisect helped, then.  Took a fair bit of time - had to start from a clean O/S build on a new HD because I was hitting unrelated bugs with LVM which leading me to "skip" a lot of tries.  Things went much faster after I installed on bare partitions.

Thanks again for the help,

Adam

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.