Bug 24300

Summary: EXA corruption is back with xorg-server 1.7.0 on RS780
Product: DRI Reporter: Mikko C. <mikko.cal>
Component: DRM/RadeonAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: desintegr, jlp.bugs, oldium.pro, orzel, phercek, smoothhound, Tanktalus, tomas.linhart, victor.noel, xorg-driver-ati, zecmerquise
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
corruption in Konversation
none
corruption in chrome
none
corruption in chrome
none
corruption in Yakuake
none
Xorg.0.log
none
corruption in Konversation
none
corruption in chrome
none
Xorg.0.log with EXANoDownloadFromScreen
none
Xorg.0.log with 1.6.4
none
Patch that reverts EXA defragmentation
none
patch to fix corruption
none
Proposed fix
none
Corruption in konqueror with Oldrich's patch none

Description Mikko C. 2009-10-04 01:12:19 UTC
Created attachment 30031 [details]
corruption in Konversation

After it was fixed in some 1.6.X release, it's back in 1.7.0
Using radeon rs780 with 2.6.32-rc1.
All is fine when going back to 1.6.4.
Two screenshots will follow.
Comment 1 Mikko C. 2009-10-04 01:15:46 UTC
Created attachment 30032 [details]
corruption in chrome
Comment 2 Mikko C. 2009-10-04 01:19:33 UTC
If it matters, I'm not using Composite
Comment 3 Michel Dänzer 2009-10-04 04:33:24 UTC
Does

	Option		"EXAOptimizeMigration" "off"

work around the problem?
Comment 4 Mikko C. 2009-10-04 05:54:41 UTC
Created attachment 30035 [details]
corruption in chrome

Not really. It might be slightly better, but I'm not so sure. See this screeshot and the following. The corruption looks exactly the same as without "EXAOptimizeMigration" "off"
Comment 5 Mikko C. 2009-10-04 05:55:11 UTC
Created attachment 30036 [details]
corruption in Yakuake

this also looks the same as it did before.
Comment 6 Michel Dänzer 2009-10-04 14:32:35 UTC
Please attach the log file from testing the option.
Comment 7 Mikko C. 2009-10-04 23:14:34 UTC
Created attachment 30063 [details]
Xorg.0.log
Comment 8 Michel Dänzer 2009-10-05 01:44:22 UTC
Does

        Option          "EXANoDownloadFromScreen"

work around the problem?

If not, I'm not sure what the problem could be... Please also attach a log file from a working run with xserver 1.6.4, and try isolating the problem if you can - ideally with git bisect, but even just trying the xserver 1.7 RCs could be a start.
Comment 9 Mikko C. 2009-10-05 03:14:37 UTC
Created attachment 30066 [details]
corruption in Konversation

(In reply to comment #8)
> Does
> 
>         Option          "EXANoDownloadFromScreen"
> 
> work around the problem?
>

Not really, it seems to happen less often, but it's definitely still there.
See screenshots.
Comment 10 Mikko C. 2009-10-05 03:15:04 UTC
Created attachment 30067 [details]
corruption in chrome
Comment 11 Mikko C. 2009-10-05 03:17:55 UTC
Created attachment 30068 [details]
Xorg.0.log with EXANoDownloadFromScreen

New Xorg.0.log

I'll get a 1.6.4 Xorg.0.log when I have time to do the downgrade. I'm not sure I'll have the time to do the bisect in the next few weeks tho.
Comment 12 Mikko C. 2009-10-05 04:55:30 UTC
Created attachment 30072 [details]
Xorg.0.log with 1.6.4
Comment 13 Michel Dänzer 2009-10-05 05:40:17 UTC
(In reply to comment #9)
> > Does
> > 
> >         Option          "EXANoDownloadFromScreen"
> > 
> > work around the problem?
> 
> Not really, it seems to happen less often, but it's definitely still there.

What if you add in

        Option          "EXANoUploadToScreen"

as well?
Comment 14 Mikko C. 2009-10-05 08:47:46 UTC
Option          "EXANoUploadToScreen" does not help unfortunately.

But I've noticed another thing: with 1.7.0 my dmesg contains lots of these errors:

[drm:radeon_cp_indirect] *ERROR* sending pending buffer 21
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 2
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 25
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 4
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 12
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 29
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 14
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 11
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 28
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 0
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 12
[drm:radeon_cp_indirect] *ERROR* sending pending buffer 25

So it might be related.
Comment 15 Mikko C. 2009-10-05 09:15:37 UTC
More info. A user on IRC reported that he was having the same "drm:radeon_cp_indirect" errors with radeonHD. He also reported that, at the time, reverting 510cbd43cd4e34bd459e8f74ab2855714b4ca95d EXA: Defragment offscreen memory. fixed the issue for him.
Comment 16 Mikko C. 2009-10-21 02:52:33 UTC
Michel, if you say this is a bug in radeon, can you please modify the bug report and cc accordingly? Thanks
Comment 17 Mikko C. 2009-10-24 01:34:12 UTC
this happens also with radeonhd. I get the same corruption and message [drm:radeon_cp_indirect] *ERROR* sending pending buffer

Kernel 2.6.32-rc5 + last patches from drm-next, mesa master, ddx master.
Comment 18 Darin McBride 2009-10-28 11:07:43 UTC
Latest radeon driver + Xorg 1.7.1 = this problem for me (Sapphire Radeon 3870HD, dual monitor, 2.6.30-gentoo-r5).
Comment 19 Victor NOEL 2009-11-17 02:30:51 UTC
Hello, is this linked to bug #24771 ?
Comment 20 Michel Dänzer 2009-11-17 03:00:56 UTC
(In reply to comment #19)
> Hello, is this linked to bug #24771 ?

Possibly, but until it's confirmed it's better to track them separately.
Comment 21 Mikko C. 2009-11-17 04:01:32 UTC
yes it looks exactly the same corruption/bug. Enabling KMS makes the error go away: [drm:radeon_cp_indirect] *ERROR* sending pending buffer

but there are tons of other corruption issues with KMS enabled, so it's not really an option for me.
Comment 22 Alex Deucher 2009-11-23 07:51:14 UTC
This looks to be related to xserver exa changes in xserver 1.7.x.  We are probably ending up with a missing *Done() call in the exa code which results in the command buffer not getting sent when it's supposed to.  I would suggest bisecting the xserver.
Comment 23 Mikko C. 2009-11-23 08:43:52 UTC
If someone wants to start bisecting the xserver I suggest reading my comment #15 in this report and start from commit 510cbd43cd4e34bd459e8f74ab2855714b4ca95d, EXA: Defragment offscreen memory.
Comment 24 Oldrich Jedlicka 2009-11-25 10:47:25 UTC
(In reply to comment #23)
> If someone wants to start bisecting the xserver I suggest reading my comment
> #15 in this report and start from commit
> 510cbd43cd4e34bd459e8f74ab2855714b4ca95d, EXA: Defragment offscreen memory.

I've tried to compile xorg-server from 1.7 branch but first compilable version (with libs that I have) was 20daa145c437c3ba67970146f6182849f87a1b43, but this one didn't want to start-up. I need to recompile more things probably, but this is no-go for me as I need to use the computer...
Comment 25 Oldrich Jedlicka 2009-11-25 11:49:09 UTC
Created attachment 31477 [details] [review]
Patch that reverts EXA defragmentation

As the compiling the whole xorg-server is a no-go, I've tried to revert first patch that Mikko suggested. There were some conflicts, but I've tried to be reasonable during resolving them. I don't understand the code, so it might be wrong :-)

At least it works for me currently, I will write if I encounter corruption again.

So please test it yourself too :-)
Comment 26 Mikko C. 2009-11-26 11:21:36 UTC
I can confirm the attached patch fixes both the corruption and the [drm:radeon_cp_indirect] *ERROR* sending pending buffer.

It applies cleanly to xorg-server 1.7.1
Comment 27 Darin McBride 2009-11-26 11:25:00 UTC
Thanks, that patch seems to have fixed it here, as well.  I've not noticed any
corruption so far today with your patch applied, and normally oowriter would
trigger it so bad as to make it nearly impossible to use.
Comment 28 Michel Dänzer 2009-11-26 11:41:19 UTC
Hmm, interesting. I guess this leaves a few basic possibilities as to what the problem could be:

1. The R6/7xx EXA code might not like something about the way
   ExaOffscreenDefragment() calls its Copy hooks. Can't see anything offhand that
   could be problematic though.

2. The ExaOffscreenDefragment() call in exaOffscreenAlloc() might happen at times
   the driver can't handle it. It should be easy to test this by disabling just
   that call.

3. Some kind of ordering or other issue between the ExaOffscreenDefragment() call
   in ExaWakeupHandler() and the R6/7xx EXA code. Again should be easily testable
   by disabling just that call.

Would be great if you guys could try ruling out 2. or 3. Of course I might be missing something, I'm open for other ideas.
Comment 29 Oldrich Jedlicka 2009-11-26 13:03:40 UTC
Just a note - my patch actually reverts also (partially) the next one - "EXA: Allocate from the end of free offscreen memory rather than from the start." The real_size is now most probably wrongly calculated.
Comment 30 Darin McBride 2009-11-26 22:53:42 UTC
Ok, I take that back, somewhat.  I'm still getting corruptions, but not from scrolling anymore.  The patch has improved X to the point of being usable, but not 100%.  And the ERRORs in dmesg have disappeared, even when I do get corruption.  And the corruption I do get is harder to reproduce.
Comment 31 Mikko C. 2009-11-27 03:02:23 UTC
Created attachment 31504 [details] [review]
patch to fix corruption

(In reply to comment #28)
> 2. The ExaOffscreenDefragment() call in exaOffscreenAlloc() might happen at
> times
>    the driver can't handle it. It should be easy to test this by disabling just
>    that call.
> 

I did as you suggested and the corruption is gone with this patch, it applies to xorg 1.7.1.

I also tried your 3rd suggestion but that did not fix it.
Comment 32 Michel Dänzer 2009-11-27 03:14:19 UTC
(In reply to comment #31)
> I did as you suggested and the corruption is gone with this patch, it applies
> to xorg 1.7.1.

This makes 2. very likely to be the problem. My only concern is that your patch still updates pExaScr->lastDefragment, can you try disabling that as well and confirm that it still fixes the problem?

> I also tried your 3rd suggestion but that did not fix it.

That's good to know.
Comment 33 Mikko C. 2009-11-27 03:56:42 UTC
(In reply to comment #32)
> 
> This makes 2. very likely to be the problem. My only concern is that your patch
> still updates pExaScr->lastDefragment, can you try disabling that as well and
> confirm that it still fixes the problem?
> 

Yep, I tried removing that line too and it still works. No errors in dmesg or corrpution.
Comment 34 Alex Deucher 2009-11-27 08:23:51 UTC
Does the defragment code deal with the fact that r6xx+ usually requires a temp surface for overlapping copies?
Comment 35 Michel Dänzer 2009-11-27 08:36:58 UTC
(In reply to comment #34)
> Does the defragment code deal with the fact that r6xx+ usually requires a temp
> surface for overlapping copies?

As r600_exa.c doesn't set the EXA_SUPPORTS_OFFSCREEN_OVERLAPS flag, there should be no overlapping copies from the defragmentation. I suspect the problem is that exaOffscreenAlloc() may trigger defragmentation in the middle of whatever. I'll probably just remove that and only keep the defragmentation at regular intervals.
Comment 36 Michel Dänzer 2009-11-28 03:52:24 UTC
Created attachment 31531 [details] [review]
Proposed fix

This is the fix I'm planning to submit for xserver Git. Please test to make sure it doesn't cause any regressions.
Comment 37 Mikko C. 2009-11-28 04:46:52 UTC
(In reply to comment #36)
> 
> This is the fix I'm planning to submit for xserver Git. Please test to make
> sure it doesn't cause any regressions.
> 

So far I can't notice any regressions, using 1.7.1.
Is the fix going to be backported to 1.7 branch?
Comment 38 Darin McBride 2009-11-30 07:52:19 UTC
Created attachment 31597 [details]
Corruption in konqueror with Oldrich's patch

With Oldrich's patch, I'm still getting corruption.  I've compiled with Michel's patch instead, and as soon as I get the chance to restart X, I will test it out.

I'm attaching the corruption from konqueror.  All I have to do to get rid of it is select another tab and then come back, which makes it seem like it's an upload/download type of issue.
Comment 39 Michel Dänzer 2009-12-01 08:21:04 UTC
Fix landed on master and nominated for server-1.7-branch.
Comment 40 Andriy Gapon 2010-09-27 02:01:31 UTC
Hmm, this bug is marked as resolved/fixed, but I still can reproduce the issue readily with xorg-server-1.7.7 and xorg-server-1.8.2 on FreeBSD (so no KMS).

Committed fix is only a half of what was proposed by Oldrich in comment #25.
I applied the second part too and now the situation seems to be very much improved.
Comment 41 Michel Dänzer 2010-09-27 04:44:57 UTC
(In reply to comment #40)
> Committed fix is only a half of what was proposed by Oldrich in comment #25.
> I applied the second part too and now the situation seems to be very much
> improved.

Oldrich's patch completely removed EXA offscreen memory defragmentation again, which is undesirable as performance will degrade over time due to the fragmentation. Your problem could be due to a bug in the driver or maybe an ordering issue between the EXA Block/WakeupHandlers and those of the driver (if any).
Comment 42 Andriy Gapon 2010-09-27 13:56:36 UTC
Well, not sure what's going, but the problem is still reproducible for me even after applying full Oldrich's path.
I'll follow up on my issue in bug #24771.
Comment 43 Andriy Gapon 2011-03-21 01:52:48 UTC
(In reply to comment #42)
> Well, not sure what's going, but the problem is still reproducible for me even
> after applying full Oldrich's path.
> I'll follow up on my issue in bug #24771.

Just a FYI - I see the corruption, but it seems to be of somewhat different kind and caused by Bug 27627.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.