Bug 24820

Summary: [nv15] Vertical screen corruption in X when KMS is enabled
Product: xorg Reporter: Johannes Obermayr <johannesobermayr>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: ingo_brunberg
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
Xorg.0.log
none
Xorg.0.log.diff
none
nv15_arbitration.patch
none
arb_debug.patch
none
dmesg
none
arb_debug2.patch
none
dmesg with arb_debug2.patch
none
nv15_arbitration2.patch
none
arb_rewrite.patch
none
arb_rewrite2.patch none

Description Johannes Obermayr 2009-10-30 13:32:37 UTC
Hardware: Geforce2 GTS
Drivers, Mesa, kernel modules: git20091029

If I boot with nouveau.modeset=1 X/KDE4 looks like this:

http://www.freeimagehosting.net/uploads/ea8ad8c8cc.jpg

Console (init 3) looks fine.

dmesg and Xorg.0.log follow...
Comment 1 Johannes Obermayr 2009-10-30 13:33:14 UTC
Created attachment 30847 [details]
dmesg
Comment 2 Johannes Obermayr 2009-10-30 13:33:57 UTC
Created attachment 30848 [details]
Xorg.0.log
Comment 3 Pekka Paalanen 2009-10-30 15:44:22 UTC
Just in case, could you reproduce this without nouveau_dri.so?
Thanks.

I'm actually surprised X does not die a horrible death instantly.
Comment 4 Johannes Obermayr 2009-10-30 17:22:38 UTC
Created attachment 30860 [details]
Xorg.0.log.diff

Yes, same behavior without nouveau_dri.so.

I attach a diff for Xorg.0.log.
Comment 5 Francisco Jerez 2009-11-06 19:46:31 UTC
(In reply to comment #0)
> Hardware: Geforce2 GTS
> Drivers, Mesa, kernel modules: git20091029
> 
> If I boot with nouveau.modeset=1 X/KDE4 looks like this:
> 
> http://www.freeimagehosting.net/uploads/ea8ad8c8cc.jpg
> 
Do you get the same effect with smaller resolutions/clocks? (e.g. 1024x768@60Hz).

> Console (init 3) looks fine.
> 
You mean it looks fine before loading nouveau.ko, right?

> dmesg and Xorg.0.log follow...
> 
Comment 6 Johannes Obermayr 2009-11-07 02:43:08 UTC
(In reply to comment #5)
> > 
> Do you get the same effect with smaller resolutions/clocks? (e.g.
> 1024x768@60Hz).

Yes, it is with all resolutions.
800x600   -> 2 corrupted bars
1024x768  -> 3 corrupted bars
1280x720  -> 4 corrupted bars
1920x1080 -> 6 corrupted bars
 
> > Console (init 3) looks fine.
> > 
> You mean it looks fine before loading nouveau.ko, right?

No, I boot with "nouveau.modeset=1 3" and can work on a KMS enabled console without any problems...

I tried it with "1...2...3...3...4...5...6...7" to see whether something is moved to another position (as in X/KDE) -> but all is on right place...

Btw. I use git20091105

Btw2. if I use my TNT2 M64 KMS works also fine but X/kdm login manager displays only a black background and a white (login) box...

If you want pictures/logs I will provide them...
Comment 7 Francisco Jerez 2009-11-07 06:40:39 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > > 
> > Do you get the same effect with smaller resolutions/clocks? (e.g.
> > 1024x768@60Hz).
> 
> Yes, it is with all resolutions.
> 800x600   -> 2 corrupted bars
> 1024x768  -> 3 corrupted bars
> 1280x720  -> 4 corrupted bars
> 1920x1080 -> 6 corrupted bars
> 

Which clock frequencies were you using exactly? It would also be interesting to know if it works at 24 bit depth or with kernel modesetting disabled.

> > > Console (init 3) looks fine.
> > > 
> > You mean it looks fine before loading nouveau.ko, right?
> 
> No, I boot with "nouveau.modeset=1 3" and can work on a KMS enabled console
> without any problems...
> 
> I tried it with "1...2...3...3...4...5...6...7" to see whether something is
> moved to another position (as in X/KDE) -> but all is on right place...
> 
> Btw. I use git20091105
> 
> Btw2. if I use my TNT2 M64 KMS works also fine but X/kdm login manager displays
> only a black background and a white (login) box...
> 
Logs and config files might be helpful, but probably an unrelated bug.

> If you want pictures/logs I will provide them...
> 

Comment 8 Johannes Obermayr 2009-11-07 07:59:04 UTC
(In reply to comment #7)
> Which clock frequencies were you using exactly? It would also be interesting to
> know if it works at 24 bit depth or with kernel modesetting disabled.

It works with KMS disabled (16 / 24 bit depth).

It also works with 24 bit depth and KMS enabled.

Only with 16 bit and KMS enabled it does not work all time...

Tested resolutions (all 3 cases above):
800x600   @ 76 / 75 / 70.7 / 74.9 / 65.3 / 60.3 / 60 Hz
1024x768  @ 76 / 75.1 / 70.7 / 74.8 / 65.3 / 60 Hz
1280x720  @ 76 / 70.7 / 65.3 / 74.9 / 60 Hz
1920x1080 @ 60 / 59 Hz

> > Btw2. if I use my TNT2 M64 KMS works also fine but X/kdm login manager displays
> > only a black background and a white (login) box...
> > 
> Logs and config files might be helpful, but probably an unrelated bug.
 
I will try it soon. If it is not also "fixed" with 24 bit depth I will post log/config files.
Comment 9 Francisco Jerez 2009-11-07 09:38:24 UTC
Created attachment 31041 [details] [review]
nv15_arbitration.patch

Does the attached patch make any difference?
Comment 10 Johannes Obermayr 2009-11-07 11:27:09 UTC
This patch helps.

Same tests as above -> all modes work...

Only one thing I recognized:
If you have 24 bit depth, KMS enabled and try changing resolution to a higher one within X the screen gets corrupted.
e.g.:
1980x1080 -> 1280x720 -> 1024x768 -> 800x600 is possible without screen corruption.
1980x1080 -> 1280x720 -> 1920x1080 or 1980x1080 -> 1024x768 -> 1280x720 cause a screen corruption which can only be solved by pressing Ctrl+Alt+Backspace...

But this bug is solved and so I am closing it...

Thanks for your work.
Comment 11 Francisco Jerez 2009-11-08 06:09:44 UTC
(In reply to comment #10)
> This patch helps.
> 
> Same tests as above -> all modes work...
> 

From nv15_arbitration.patch:
> -               min_clwm = 1024 - cbs + 128 * pclk_freq / 100000;
> +               min_clwm = 1024 - cbs + 8;

It would be interesting to know how high can you set that 8 without seeing any sort of corruption. It should be something within [8,221].

> Only one thing I recognized:
> If you have 24 bit depth, KMS enabled and try changing resolution to a higher
> one within X the screen gets corrupted.
> e.g.:
> 1980x1080 -> 1280x720 -> 1024x768 -> 800x600 is possible without screen
> corruption.
> 1980x1080 -> 1280x720 -> 1920x1080 or 1980x1080 -> 1024x768 -> 1280x720 cause a
> screen corruption which can only be solved by pressing Ctrl+Alt+Backspace...
> 
Anything suspicious in the logs when that happens?
Comment 12 Francisco Jerez 2009-11-08 06:15:57 UTC
Created attachment 31045 [details] [review]
arb_debug.patch

This patch makes the driver log the arbitration parameters it's programming the hardware with. Could you please attach the kernel logs you get after running X with it?
Comment 13 Johannes Obermayr 2009-11-08 14:37:51 UTC
I did not have much time to do your requests at home today and I will going on with compiling and testing on friday.

All I can say is that 83 is without bars and 86 has bars with 1280x720 and 16 bit depth (I have reduced resolution for more speed...).

So I have to try it with 84 and 85...
Comment 14 Johannes Obermayr 2009-11-14 08:10:25 UTC
Created attachment 31195 [details]
dmesg

(In reply to comment #11 an #12)

Your requested dmesg with a multiplier of 31.

I noticed:
A lower resolution needs a lower multiplier and
a lower frequency within a resolution needs also a lower multiplier.

With 640x480@60 I can go up to a multiplier of 31 without seeing bars.

Shall I detect multipliers of other resolutions/frequencies, too?
Comment 15 Johannes Obermayr 2009-11-14 16:12:50 UTC
(In reply to comment #14)
> Created an attachment (id=31195) [details]
> dmesg
> 
> (In reply to comment #11 an #12)
> 
> Your requested dmesg with a multiplier of 31.
> 
> I noticed:
> A lower resolution needs a lower multiplier and
> a lower frequency within a resolution needs also a lower multiplier.
> 
> With 640x480@60 I can go up to a multiplier of 31 without seeing bars.
> 
> Shall I detect multipliers of other resolutions/frequencies, too?
> 

Oh, it is not a multiplier - it is a summand...
(I should look right before posting...)
Comment 16 Francisco Jerez 2009-11-15 14:29:08 UTC
Created attachment 31215 [details] [review]
arb_debug2.patch

I'm afraid I need a bit more of information about your card. The patch I'm attaching should print it out to the logs, you just have to run X at 1280x720x16bpp (and the same refresh rate you were using on your previous tests) after applying it to the DRM.

Thanks.
Comment 17 Johannes Obermayr 2009-11-20 08:06:33 UTC
Created attachment 31345 [details]
dmesg with arb_debug2.patch

It is weekend and I am at home again...

Here is your requested dmesg (1280x720x16bpp@76Hz).

With arb_debug2.patch KMS enabled console looks also corrupted.
Comment 18 Francisco Jerez 2009-11-21 09:59:05 UTC
Created attachment 31374 [details] [review]
nv15_arbitration2.patch

This fix is hopefully more correct, it would be interesting to know if you still see any kind of corruption or flashing pixels at whatever resolution/depth/refresh rate.
Comment 19 Johannes Obermayr 2009-11-21 12:03:53 UTC
(In reply to comment #18)

With nv15_arbitration2.patch I can use all resolutions/frequencies without flickering bars...

Some other corrupted pixels continue to be visible:

http://www.freeimagehosting.net/uploads/fda04aaf3c.jpg

:s below headline
.s within "white" space between selection dialog and green box  

But it was/is also without KMS enabled...
I may open a new bug for that...

I add Ingo Brunberg to CC (Bug #24800).
He may tests nv15_arbitration2.patch and posts his "experience" with it.

From my side many thanks to Francisco for his great work.
Comment 20 Francisco Jerez 2009-11-21 13:35:28 UTC
(In reply to comment #19)
> (In reply to comment #18)
> Some other corrupted pixels continue to be visible:
> 
> http://www.freeimagehosting.net/uploads/fda04aaf3c.jpg
> 
> :s below headline
> .s within "white" space between selection dialog and green box  
>
That doesn't seem to be a scanout effect, can you still see the artifacts when you look at a screenshot?
Comment 21 Johannes Obermayr 2009-11-21 14:11:57 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > (In reply to comment #18)
> > Some other corrupted pixels continue to be visible:
> > 
> > http://www.freeimagehosting.net/uploads/fda04aaf3c.jpg
> > 
> > :s below headline

are mostly visible (also in other KDE apps)

> > .s within "white" space between selection dialog and green box  

are only sometimes visible

> >
> That doesn't seem to be a scanout effect, can you still see the artifacts when
> you look at a screenshot?
> 

With a Snapshot via KSnapshot I can see the artifacts, too. (If visible on "normal" screen at time when "KSnapshotting").
Comment 22 Francisco Jerez 2009-11-23 03:57:37 UTC
Created attachment 31409 [details] [review]
arb_rewrite.patch

I think I'll go with a rewrite of the involved code like the patch I'm attaching instead of trying to fix the old code (It's nv legacy and it had some issues besides the one you hit). Maybe you want to give it a try?
Comment 23 Francisco Jerez 2009-11-23 05:09:50 UTC
Created attachment 31412 [details] [review]
arb_rewrite2.patch

I suspect nv15 has a shorter CRTC FIFO, you may get a distorted output with the last patch but this one should fix it.
Comment 24 Francisco Jerez 2009-11-23 06:14:15 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > (In reply to comment #18)
> > > Some other corrupted pixels continue to be visible:
> > > 
> > > http://www.freeimagehosting.net/uploads/fda04aaf3c.jpg
> > > 
> > > :s below headline
> 
> are mostly visible (also in other KDE apps)
> 
> > > .s within "white" space between selection dialog and green box  
> 
> are only sometimes visible
> 
> > >
> > That doesn't seem to be a scanout effect, can you still see the artifacts when
> > you look at a screenshot?
> > 
> 
> With a Snapshot via KSnapshot I can see the artifacts, too. (If visible on
> "normal" screen at time when "KSnapshotting").
> 

Does it go away if you set this Section "Device" option in your xorg.conf?
> Option "EXANoComposite" "on"

Comment 25 Johannes Obermayr 2009-11-27 10:19:23 UTC
A new friday - a new status report:

(In reply to comment #23)
> last patch but this one should fix it.
> 

I tried it with arb_rewrite2.patch and also with

[PATCH 1/3] drm/nouveau: Update the CRTC arbitration parameters on FB depth switch
[PATCH 2/3] drm/nouveau: Clean up the arbitration parameters calculation code.
[PATCHv2 3/3] drm/nv10-nv20: CRTC arbitration code rewrite.

X works but console has flickering artifacts with both.
(see http://www.freeimagehosting.net/uploads/0f34a8e515.jpg)

(In reply to comment #24)
>> With a Snapshot via KSnapshot I can see the artifacts, too. (If visible on
>> "normal" screen at time when "KSnapshotting").
>> 
>
>Does it go away if you set this Section "Device" option in your xorg.conf?
>> Option "EXANoComposite" "on"

Yes, but there are other artifacts: E. g. bottom left and bottom right "Position" line or "Gamma" with corrupted last "a" and "Mehrere Monitore" with corrupted last "e". 
(see http://www.freeimagehosting.net/uploads/d94750f7aa.png)

Do you need a current dmesg?
Comment 26 Francisco Jerez 2009-11-28 06:16:34 UTC
(In reply to comment #25)
> A new friday - a new status report:
> 
> (In reply to comment #23)
> > last patch but this one should fix it.
> > 
> 
> I tried it with arb_rewrite2.patch and also with
> 
> [PATCH 1/3] drm/nouveau: Update the CRTC arbitration parameters on FB depth
> switch
> [PATCH 2/3] drm/nouveau: Clean up the arbitration parameters calculation code.
> [PATCHv2 3/3] drm/nv10-nv20: CRTC arbitration code rewrite.
> 
> X works but console has flickering artifacts with both.
> (see http://www.freeimagehosting.net/uploads/0f34a8e515.jpg)
> 

It seems your FIFOs are underflowing because of some unknown source of latency.
If you have applied the patch series I sent to the mailing list, nouveau_calc.c:191 will read:

>         fifo->lwm = min_lwm + 5 * (max_lwm - min_lwm) / 100; /* Empirical. */

That "5" is enough for all the cards I have, but you'll need a higher value, probably something within [5,40] (don't bother to find a very accurate lower bound, a +/-5 estimate would be fine).

> (In reply to comment #24)
> >> With a Snapshot via KSnapshot I can see the artifacts, too. (If visible on
> >> "normal" screen at time when "KSnapshotting").
> >> 
> >
> >Does it go away if you set this Section "Device" option in your xorg.conf?
> >> Option "EXANoComposite" "on"
> 
> Yes, but there are other artifacts: E. g. bottom left and bottom right
> "Position" line or "Gamma" with corrupted last "a" and "Mehrere Monitore" with
> corrupted last "e". 
> (see http://www.freeimagehosting.net/uploads/d94750f7aa.png)
> 

The recent changes in the acceleration code your card uses may be the cause of this. You could try going back to the DDX commit 5587f40, keeping EXANoComposite set (note that you'll also have to downgrade libdrm, I think 83a35b6 is old enough).

> Do you need a current dmesg?
> 

No, unless you've seen something suspicious (e.g. PGRAPH errors).
Comment 27 Johannes Obermayr 2009-11-28 07:11:18 UTC
(In reply to comment #26)
> It seems your FIFOs are underflowing because of some unknown source of latency.
> If you have applied the patch series I sent to the mailing list,
> nouveau_calc.c:191 will read:
> 
> >         fifo->lwm = min_lwm + 5 * (max_lwm - min_lwm) / 100; /* Empirical. */
> 
> That "5" is enough for all the cards I have, but you'll need a higher value,
> probably something within [5,40] (don't bother to find a very accurate lower
> bound, a +/-5 estimate would be fine).
> 

"10" works for me...

Thanks again for your work and patience...
Comment 28 Francisco Jerez 2009-11-28 07:54:07 UTC
(In reply to comment #27)
> (In reply to comment #26)
> > It seems your FIFOs are underflowing because of some unknown source of latency.
> > If you have applied the patch series I sent to the mailing list,
> > nouveau_calc.c:191 will read:
> > 
> > >         fifo->lwm = min_lwm + 5 * (max_lwm - min_lwm) / 100; /* Empirical. */
> > 
> > That "5" is enough for all the cards I have, but you'll need a higher value,
> > probably something within [5,40] (don't bother to find a very accurate lower
> > bound, a +/-5 estimate would be fine).
> > 
> 
> "10" works for me...
> 
> Thanks again for your work and patience...
> 

Pushed, thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.