Bug 61287

Summary: (bisected) kernel>=3.7 breaks nouveau with acceleration for GeForce 6150SE nForce 430
Product: xorg Reporter: gabriele balducci <balducci>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=50091
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
kernel configuration
none
x server configuration
none
xorg log after crash
none
dmesg after crash
none
last part of dmesg after crash none

Description gabriele balducci 2013-02-22 15:45:51 UTC
Hello,

I am experiencing X11 problems with kernel>=3.7 (including latest 3.8
release) with the following GPU:

   NVIDIA Corporation C61 [GeForce 6150SE nForce 430] (rev a2)

I am on an AMD Athlon 64 X2 Dual Core Processor, running in
32bit mode and using DRM/NOUVEAU in KMS.

Further (possibly) useful information about software versions is:

          mesa                9.0.2   
          libdrm              2.4.42  
          xf86-video-nouveau  1.0.6   
          xorg-server         1.13.2  


I can boot just fine in KMS.

Also X11 (which I run with startx) boots nicely and is *almost*
perfectly functional (which is why it took me a while to focus the
problem)

The problem is apparently related to acceleration: if I run glxgears
(or vmd, a molecular visualization code which uses acceleration), the
X server suddenly looks (to me) overloaded and unstable (e.g. I cannot move
the pointer any more); within some 10-20 sec I can escape to the
console with CTRL-ALT-F1 and just kill manually the X server with
CTRL-c. This however takes a bit and in the meantime I get the
following repeated messages (apparently) from some nouveau component
on the console:

   nouveau E[     891] failed to idle channel 0xcccc0000

After this, the GPU is apparently left in a sort of locked state: if I
restart X, the screen shows the image just before the crash and is
totally unresponsive. However, I can still shutdown X with CTRL-ALT-F1
and kill manually the server. In order to have X running again I have
to reboot the machine.

The problem is completely reproducible.

If I boot into kernel-3.6.10 (everything else unchanged), nothing of the
above happens and glxgears (and vmd) work absolutely fine.


I have bisected the kernel commits from 3.6.10 to 3.7-rc1 in
drivers/gpu/drm/nouveau/ and it turns out that:

     612a9aab56a93533e76e3ad91642db7033e03b69 is the first bad commit

    commit 612a9aab56a93533e76e3ad91642db7033e03b69
    Merge: 3a49431 268d283
    Author: Linus Torvalds <torvalds@linux-foundation.org>
    Date:   Wed Oct 3 23:28:59 2012 -0700

        Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux

        Pull drm merge (part 1) from Dave Airlie:
         "So first of all my tree and uapi stuff has a conflict mess, its my
          fault as the nouveau stuff didn't hit -next as were trying to rebase
          regressions out of it before we merged.

          Highlights:
           - SH mobile modesetting driver and associated helpers
           - some DRM core documentation
           - i915 modesetting rework, haswell hdmi, haswell and vlv fixes, write
             combined pte writing, ilk rc6 support,
           - nouveau: major driver rework into a hw core driver, makes features
             like SLI a lot saner to implement,
           - psb: add eDP/DP support for Cedarview
           - radeon: 2 layer page tables, async VM pte updates, better PLL
             selection for > 2 screens, better ACPI interactions
    [...]


Problems for the same GPU with kernel>=3.7 have been already reported
here:

      https://bugzilla.redhat.com/show_bug.cgi?id=905629

However, it is not clear to me if that report and the present one are
dealing with the same thing...

Please, find enclosed:
=> kernel configuration
=> xorg configuration
=> xorg log after the crash
=> dmesg after the crash


(To avoid possible confusion: I opened a bug here some time ago for
another problem with another NVIDIA GPU (Bug 58776): in that report I
said that this GPU was working fine: of course, I did not discovered
yet the present issue)


I thank you very much in advance for any help

ciao
gabriele
Comment 1 gabriele balducci 2013-02-22 15:47:09 UTC
Created attachment 75326 [details]
kernel configuration
Comment 2 gabriele balducci 2013-02-22 15:58:02 UTC
Created attachment 75327 [details]
x server configuration
Comment 3 gabriele balducci 2013-02-22 16:03:29 UTC
Created attachment 75328 [details]
xorg log after crash
Comment 4 gabriele balducci 2013-02-22 16:04:50 UTC
Created attachment 75329 [details]
dmesg after crash
Comment 5 gabriele balducci 2013-02-25 08:53:06 UTC
Following comment #8 in
https://bugs.freedesktop.org/show_bug.cgi?id=61321 (which deals
with the same GPU), the problems go away if I boot 3.7.9 with
nouveau.noaccel=1.

Of course, this means that glxgears and vmd run without crashing
the X11 session, *BUT* without acceleration. In particular, vmd without
acceleration is unacceptably slow for me, so running without
acceleration is not an option...

However, this reinforces the idea that the problem is definitely
related to acceleration.
Comment 6 georgy 2013-08-06 18:30:08 UTC
Created attachment 83730 [details]
last part of dmesg after crash
Comment 7 georgy 2013-08-06 18:31:38 UTC
I got same error on i686.
linux 3.10.4-300.fc19.i686.PAE #1 SMP 
VGA NVIDIA Corporation NV31 [GeForce FX 5600XT]
Comment 8 Emil Velikov 2013-08-06 19:58:31 UTC
Gabriele

Can you try booting with "nouveau.config=NvPCIE=0"? It will force nouveau to use nv04 style vm and could handle your issue

If that does not work out, can you redo the bisection ?


Georgy

Please open a separate bug following the instructions [1]


Cheers
Emil

[1] http://nouveau.freedesktop.org/wiki/Bugs/
Comment 9 Frank Schaefer 2013-08-06 21:07:55 UTC
See https://bugzilla.kernel.org/show_bug.cgi?id=50091

Bisection is utterly broken.
Comment 10 gabriele balducci 2013-08-07 08:24:36 UTC
hello,
I am presently away from the machine which I reported the problem for and playing with remote reboots is too risky for me at the moment.

However: for me the problem apparently went away somewhere in the 3.8.x or early 3.9.x window: at present (3.10.5) acceleration does work (only the vmd molecular visualization code seems to have problems with recent (>=9.1.x) mesa)

Looking into my logs, I am pretty sure I tried this nouveau.config=NvPCIE=0 kernel opt, but with no success.

I remark that, at present, the GPU seems to work fine with acceleration enabled: glxgears runs without problems and so is for vmd (with the mesa limitation mentioned above)

This makes me think that the problem reported by georgy for the different GPU might be actually different.

Also, frank: you are running the same GPU: did you try a recent (3.10.x) kernel?

Unfortunately, I am not able to say exactly WHICH kernel version did the trick for me: I can try booting old kernels until I reproduce the problem, if that can be of any use. But I can do this only on monday next week

ciao
gabriele
Comment 11 Frank Schaefer 2013-08-07 15:33:42 UTC
It definitely wasn't fixed up to the first 3.10-rc's.
Although it was pretty reproducable in my case, it sometimes worked fine for a few minutes, so a quick test isn't enough.

I doubt it will ever be fixed again. That's why I'm on Radeon hardware now.
Comment 12 Ilia Mirkin 2013-08-27 15:29:39 UTC
The original reporter's issues seem fixed. Feel free to re-open if I've misunderstood. I'm inclined to think that the NV31 issue is unrelated, looks like the fifo gets confused on that one.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.