33867 – [bisected] Graphics corruption related to pageflip ioctl support in 2.6.38-rc*

Bug 33867 - [bisected] Graphics corruption related to pageflip ioctl support in 2.6.38-rc*

Summary: [bisected] Graphics corruption related to pageflip ioctl support in 2.6.38-rc*

Status:	RESOLVED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Radeon (show other bugs)
Version:	DRI git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-02-03 01:39 UTC by Dave Witbrodt
Modified:	2012-07-22 12:58 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
Commits not present in 2.6.37 applied to local branch (10.25 KB, text/plain) 2011-02-03 01:39 UTC, Dave Witbrodt	no flags	Details
Output of 'git bisect log' with drm-airlied/drm-fixes tree (1.41 KB, text/plain) 2011-02-05 11:02 UTC, Dave Witbrodt	no flags	Details
xorg.conf with Option "EnablePageFlip" "off" (983 bytes, text/plain) 2011-02-05 16:24 UTC, Dave Witbrodt	no flags	Details
View All

Description Dave Witbrodt 2011-02-03 01:39:49 UTC

Created attachment 42887 [details]
Commits not present in 2.6.37 applied to local branch

I am troubleshooting some graphics corruption I noticed when testing the post-2.6.37 commits from drm-airlied git (drm-fixes in this case).  I was trying to produce a kernel as close as possible to the latest stable release (since 2.6.38 is very early in the rc stage) with all of the newest radeon-related features.

This is a preliminary report, and may turn out to be invalid, because the kernel I am using is actually a local branch from v2.6.37 with only the commits from drm-airlied relevant to my hardware individually cherry-picked.  I have bisected the problem down to a specific commit, but if I made any errors during the cherry-pick process then this report is useless.  I plan to confirm that the problem is real tomorrow, by building directly from the drm-airlied/drm-fixes tree (which was the most up-to-date tree I could find today).

As I will report below, there is some similarity with

https://bugs.freedesktop.org/show_bug.cgi?id=33515

and that is the only reason I decided to report my findings before I am really sure there is a problem.  Hopefully this will help Michel and other devs save some time and trouble if my guess (that my bug is related) turns out to be correct.
----------------------

OK, here is what I have done so far:

1.  I made a local git branch based on v2.6.37.
2.  I identified commits I wanted from drm-airlied/drm-fixes (Feb. 2, 2011)
3.  Because of GPU lockups recently cured by Alex Deucher, I picked one particular commit first (1e644d6d, "drm/radeon/kms: re-emit full context state for evergreen blits"); without this, testing would be pointless since my GPU would just lock up when trying and 3D app.
4.  I then cherry-picked the rest of the commits I had chosen, in order from oldest to newest (according to 'git log').  See attached file, 'applied-cherry-picks.txt', if interested in specifics.

Everything actually seemed to be working fine; I only happened to notice a small glitch.  I use a locally-built game called 'prboom-plus' with my original Ultimate DOOM WAD file to test Radeon support (in kernels, Mesa, xf86-video-ati, etc.).  When a game begins, there is a melt-down animation which transitions into the new game; there is a similar (I would have guessed identical before) melting effect when the player is killed but hits the space bar to restart the game.  In this second melting effect, after being killed, the "melted" part of the screen was all black, a clear regression.

Before today's testing, I had been using a straight 2.6.37 kernel with the patch from Alex I mentioned in item 3 above.  With that kernel, both melting effects work fine.  That seems to rule out a problem with xorg-server, mesa, libdrm, or xf86-video-ati.

I decided to try bisecting the issue.  I built kernels from the first and last commits listed in the attached file:  1e644d6d and dca0d612.  The former was "good" and the latter "bad".  The bisect jumped to 204663c4 and 18007401 next -- both were "bad".  The next jump, 65705962, caused 'prboom-plus' to hang during the second kind of melting; I was able to SSH into the machine and 'kill -9' the game.

Since there was a series of TTM related changes in that series of commits, I used 'git bisect skip' until I was presented with a commit before or after the TTM series; I tested the kernels (all hung on the second melt), but told git to skip them; the ones I pretended to skip were eba67093, 95762c2b, and 702adba2.  I was offered 147666fb, and it was "bad" but did not hang during the second kind of melt.

Continuing, the bisect took me back to 2357cbe5, which hung X so that it could not be killed with 'kill -9'; I had to reboot via SSH.

I skipped ecf7ace9 and 68c4fa31 without building kernels.  The bisect then went to d6ea8886 and b6724405, which hung (so I pretended to skip them).  I skipped 96726fe50 without building.  The bisect moved to 3e4ea742, which hung, and 6f34be50, which also hung.

Interestingly, f5a80209 was perfectly OK.  This meant that the last commit I tested was the first bad commit.  I had been skipping the commits causing hangs because I expected the hang to be a temporary problem resolved somewhere in the middle of the list; I believed this because of the fact the first and last commits in my list did _not_ produce hanging kernels.  Of course, I was actually building most of the kernels that I "skipped", so I knew they were "bad" even if git did not.

To sum up. the first three cherry-picks (applied to v2.6.37) were fine:

  1e644d6dce366a7bae22484f60133b61ba322911
  drm/radeon/kms: re-emit full context state for evergreen blits

  27641c3f003e7f3b6585c01d8a788883603eb262
  drm/vblank: Add support for precise vblank timestamping.

  f5a8020903932624cf020dc72455a10a3e005087
  drm/kms/radeon: Add support for precise vblank timestamping.

The commit which introduced the hangs -- and these happened entirely predictably and consistently: when the second kind of "melt" in 'prboom-plus' occurs -- was

  6f34be50bd1bdd2ff3c955940e033a80d05f248a
  drm/radeon/kms: add pageflip ioctl support (v3)

After a series of several TTM-related commits, the hanging was resolved and the second kind of melt would display in all black instead, beginning at this commit:

  147666fb3b93b8c484f562da33a37f886ddff768
  drm/radeon: Use the ttm execbuf utilities


As I mentioned above, these results are preliminary; I will try builds directly from drm-airlied tomorrow since we are snowed-in here in Michigan, and I don't have to work tomorrow.

My problems with hangs sound very similar to f.d.o. bug #33515.  This may also be related to #33418, since that user (Erdem) is using 2.6.38-rc1.


Hardware:  Radeon HD 5750 (Evergreen JUNIPER)

Software:
  kernel 2.6.37 + commits as described above
  libdrm 2.4.23
  xorg-server 1.9.3.902
  xf86-video-ati 6.13.99 (git commit 3dc28c86 of Jan. 27)
  mesa 7.11.0 (git commit 11b15c4d of Jan 30), r600g

Comment 1 Dave Witbrodt 2011-02-03 01:56:35 UTC

I would also like to mention that, other than the 'prboom-plus' all-black melt glitch, everything is working fine.  Even 'torcs', which has given me a great deal of problems in the past when testing cutting-edge drivers/software, runs better than I've ever seen it (since I used the 'nvidia' blob with my long-dead GeForce 7950GT.  I have never been able to play the "Forza" track in 'torcs' with an open source driver -- the frame rate was 8 fps or below -- until now.  Other tracks are mostly playable, though their frame rate was capped at 30 fps or less ever since vline was added to xf86-video-ati; now those rates are 25-35 fps minimum, and frequently reach the 50's (even 60's occasionally).

So, I only discovered the issue with the hangs because I was trying to bisect a minor glitch.  Someone is really doing something right with the open source support, because just since the last week of January the performance I'm seeing on this HD 5750 seems 50-150% faster!  I don't know what caused it:  I have built newer versions of the kernel, xf86-video-ati, xorg-server, and mesa during that time.  Maybe this has something to do with it:

commit 8c631cfeae29b5236928f759e222aa35e6e4984c
Author: Marek Olšák <maraeo@gmail.com>
Date:   Fri Jan 28 22:04:09 2011 +0100

    r600g: rework vertex buffer uploads
    
    Only upload the [min_index, max_index] range instead of [0, userbuf_size].
    This an important optimization.
    
    Framerate in Lightsmark:
    Before: 22 fps
    After: 75 fps
    
    The same optimization is already in r300g.

Whoever is actually to blame, I would just like to thank all of you who are working on this stuff!

Comment 2 Dave Witbrodt 2011-02-03 19:46:40 UTC

To determine whether I goofed something up with the kernel I described in my preliminary report, I built kernels from the drm-airlied/drm-fixes git tree corresponding to the important commits in the bisection process:

1. f5a8020903932624cf020dc72455a10a3e005087
drm/kms/radeon: Add support for precise vblank timestamping.

2. 6f34be50bd1bdd2ff3c955940e033a80d05f248a
drm/radeon/kms: add pageflip ioctl support (v3)

3. 147666fb3b93b8c484f562da33a37f886ddff768
drm/radeon: Use the ttm execbuf utilities

4. 1e644d6dce366a7bae22484f60133b61ba322911
drm/radeon/kms: re-emit full context state for evergreen blits

Kernel #1 was identified by 'git bisect' as the last "good" kernel, while kernel #2 was the first "bad" kernel (hangs 'prboom-plus'). During the bisection, I found that several commits after kernel #2, the the problem with the test program hanging was resolved, but melt/fade effect when playing again after being killed was now all black; the first commit which caused this change was #3. (Kernel #4 is simply the kernel I tried first, where I first noticed the problem with the black melts/fades.)

Today I built kernels directly from drm-airlied/drm-fixes (instead of the drawn out process I used before). Here is a table comparing the results from my preliminary report and my builds directly from drm-fixes:

Kernel # Preliminary drm-fixes
-------- ----------- -----------
1 OK OK
2 hangs hangs
3 black fades OK
4 black fades black fades

The difference in case #3 is baffling. It is possible that I missed some important commit(s) when trying to cherry-pick the relevant stuff into stable 2.6.37; however, those first 3 kernels (when taken from drm-fixes) are actually based on 2.6.37-rc{2,3}, so the rest of the kernel was very different from the stable 2.6.37 sources I was adding cherry-picks to.

At any rate, it looks like the hangs were resolved in the commits between #2 and #3. This suggests that I can bisect drm-fixes and find the commit that introduced the black fades/melts in prboom-plus. I have not attempted that yet, but I will do so now.

I also discovered that the kernels with the black fades/melts dump messages like this into dmesg (and syslog):

[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35!

That looks significant: none of the other kernels exhibit that behavior. (Off to bisecting again....)

Comment 3 Dave Witbrodt 2011-02-05 10:57:03 UTC

Part 1: Improving on what I've reported so far

OK, I've spent a couple of days trying to understand my situation with this bug. My previous comments here were very preliminary, so I will now try to improve on the quality of information I have provided so far. Everything (kernel DRM, Mesa, X server, Radeon driver) is working so well that it may be a complete fluke that I noticed this glitch at all.

I begin with some details about the glitch I am seeing. I am not proficient enough with screen capture software to grab stills or videos of the glitch on my system, but I have found some material on the net useful for illustrating what I am seeing.

Of all the software I use for testing, only one program (so far) reveals this bug: a locally-built version of 'prboom-plus'

http://prboom-plus.sourceforge.net

(I use it with the Ultimate DOOM WAD file which I purchased almost 20 years ago!)

DOOM used an animated fading/melting transition when starting a new game and when starting over after being killed. I'm sure you all are familiar with it; it looked like this:

http://doom.wikia.com/wiki/File:Screen_melt.gif

The 'prboom' and 'prboom-plus' programs attempt to clone such effects. Before my latest kernel upgrades, these fades/melts worked correctly.

When I decided to try out the latest Radeon DRM bits (from drm-airlied/drm-fixes) I found everything to be working great -- games, web browsing, no desktop glitches, etc. -- until I tested 'prboom-plus'. That game worked fine, except a specific fade/melt transition (not the one when the game first starts, but the one after your player is killed and you restart on the same level)
was all black on the melting part of the screen. This 7-second YouTube clip looks a lot like my bug -- I'm _only_ talking about the first half second of the clip:

http://www.youtube.com/watch?v=gDaSE8U7oEo

Since only the kernel had changed when I first noticed this all-black animation regression, I blamed it on the kernel. I am no longer certain that the kernel is to blame; new information (see later comments) makes me wonder whether it is a bug in xf86-video-ati.

At this point I began bisecting the kernel I was using: I had created a git branch from v2.6.37 and had cherry-picked the new commits that seemed relevant to my hardware, as described above in my original report here, and in Comment 2.

I ended Comment 2 as I was about to begin bisecting directly from the drm-fixes tree, in case I had botched my cherry-picks; my next comment here will pick up after that.

Comment 4 Dave Witbrodt 2011-02-05 10:58:38 UTC

Part 2: pre-rc1 and early rc* kernels suck

In Comment 2 above, I built 4 kernels directly from drm-fixes. I was guided in those choices by the results of bisecting the kernel I had made from individual cherry-picks, believing that the order in which commits appear in 'git log' is the same as the order used during 'git bisect'. (That assumption turns out to have been wrong.)

Anyway, I immediately ran into problems bisecting because the kernels you get from drm-fixes at specific SHA1 commits are in widely-varying states of usability. (That's why I was trying to cherry-pick stuff onto a stable release of 2.6.37 in the first place!). Based on my findings with the 4 kernels in Comment 2, I bisected this way:

git-bisect start 1e644d6d 147666fb

There were 5000+ commits between those 2, and the very first commit chosen for me in between these 2 was something from the post-2.6.37 merge window: that kernel would just hang during boot.

SUGGESTION: this made me wish there was a better way to test new DRM code. For example, make it possible to test against kernels where other parts of the kernel are likely to be in good shape. Wouldn't it be relatively easy to either:

A) have an additional git tree, based on the last stable kernel
release, with only new DRM-related patches (and no explosive
merge window or rc* detritus)?

or B) provide a series of patches so that people wanting to help
debug this stuff can have kernels that are fairly stable except
for the new experimental code being tested?

I love git, but this aspect -- using in-flux developer trees for bisecting -- is very bad for user testing. An "end-user-bisect" tree, rebased at each stable kernel release, containing only new DRM-related code patches, might be a great improvement over the current situation. (Unless having end users running 'git bisect', and spamming the f.d.o. Bugzilla, is something you devs are intentionally trying to avoid....)

I know this would mean even more burden on already overtaxed devs, but it's probably something that could be automated if devs who submit patches to D. Airlie bought into supporting it. Making a kernel from individual cherry-picks after gazing at 'git log' changes since the last stable release is much more difficult, but it's something I've been doing for the past year... I'm just a noob! :-)

Comment 5 Dave Witbrodt 2011-02-05 11:02:57 UTC

Created attachment 42968 [details]
Output of 'git bisect log' with drm-airlied/drm-fixes tree

Part 3:  My useless bisect of drm-fixes


I started over, after having had my butt kicked by that evil merge-window kernel.  This time I ran:

    git-bisect  start  1e644d6d  147666fb  drivers/gpu/drm/radeon

This posed the potential issue that the commit I was trying to find was not something that touched file(s) in d/g/d/radeon.  If that was the result I was going to bisect again without specifying a directory, but it actually turned out fine.  The bisect log is attached.

When I originally viewed 'git log' to select cherry picks, I saw an ordering like this:

[NEWEST]
...
1e644d6d  drm/radeon/kms: re-emit full context state for evergreen blits
...
147666fb  drm/radeon: Use the ttm execbuf utilities
...
6f34be50  drm/radeon/kms: add pageflip ioctl support (v3)
f5a80209  drm/kms/radeon: Add support for precise vblank timestamping.
...
[OLDEST]

I applied the individual cherry-picks in order from old to new, assuming later patches would depend on earlier ones to avoid conflicts.  In Comment 2 above, kernel 3 (147666fb) was OK and kernel 4 (1e644d6d) caused the black fades/melts.

The 'git bisect' process then surprised me:  in spite of the order the commits appear in 'git log', the bisection found f5a80209 and 6f34be50 _between_ 147666fb and 1e644d6d !  This explains why I was baffled in Comment 2 above:  147666fb was "good" because it is actually before f5a80209 (the last commit before problems begin) in the git history.

These results correspond exactly to what I had found in my original (preliminary) report here.

This seems to point firmly at one commit:

    6f34be50  drm/radeon/kms: add pageflip ioctl support (v3)

However, I have many doubts.  Looking at the list of commits I originally used (see my first attachment here) there are two complex series involved:

6f34be50  drm/radeon/kms: add pageflip ioctl support (v3)
3e4ea742  drm/kms/radeon: Reorder vblank and pageflip interrupt handling.
b6724405  drm/kms/radeon: Use high precision timestamps for pageflip 
                          completion events.

    and

d6ea8886  drm/ttm: Add a bo list reserve fastpath (v2)
ecf7ace9  kref: Add a kref_sub function
2357cbe5  drm/ttm: Use kref_sub instead of repeatedly calling kref_put
68c4fa31  drm/ttm: Optimize ttm_eu_backoff_reservation
96726fe5  drm/ttm: Don't deadlock on recursive multi-bo reservations
702adba2  drm/ttm/radeon/nouveau: Kill the bo lock in favour of a bo device
          fence_lock
95762c2b  drm/ttm: Improved fencing of buffer object lists
65705962  drm/ttm/vmwgfx: Have TTM manage the validation sequence.
eba67093  drm/ttm: Fix up io_mem_reserve / io_mem_free calling

Therefore, I think this bisection was inconclusive:  kernels built from commits where the "pageflip" series is incomplete or the "TTM" series is incomplete cause hangs; but what I am _really_ looking for is the first commit that allows me to build a non-hanging kernel and which also exhibits the graphics corruption.  (Here, by "hangs" I mean that 'prboom-plus' becomes unresponsive instead of performing the all-black fade/melt routine, and I have to use 'kill -9' via SSH.)

For the bisection to succeed, maybe I would have to treat kernels with hangs as "good" (as well as those causing no glitches at all) and only treating _working_ kernels which cause black fades as "bad".

In any event, I suspect now that all of this bisecting after the first one I did (on the 2.6.37 + cherry-picks local branch) has been for nothing.  [See next comment.]

It is still possible that the hangs I saw during bisection might have some relevance for bug #33515 and bug #33418, but I think it is more likely that those were merely artifacts of 'git bisect' landing in the middle of incomplete patch series (the "pageflip" and "TTM" series I listed above).

Comment 6 Dave Witbrodt 2011-02-05 11:04:22 UTC

Part 4:  Request for guidance


I would like to continue helping track this bug down.  It is only causing a minor glitch for me, but it is clearly a regression and may have more of an impact on software I have not tested (or on other people's systems).

The glitch I am observing reminds me of what was reported in bug #33918, specifically the difference between the 2 versions of 'test07.jpg' images.  There is also the DRM error msg (see Comment 2) which appears when 'prboom-plus' exhibits the black fade/melt problem:

    [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35!

My bisecting pointed directly at

    6f34be50  drm/radeon/kms: add pageflip ioctl support (v3)

which first causes hangs -- probably irrelevant, since the hangs cease once the patch series is completed -- and then leaves me with the all-black fades/melts once non-hanging kernels appear again after the TTM series.

The driver issue in #33918 made me wonder:  if the pageflip support is actually involved here, what if I go back to an old driver before pageflip support was added, and use the new kernel with the pageflip DRM bits?

I have tested that, and here is my current situation:

    $ uname -r
    2.6.37+drm2.6.38-rc3.110201.desktop.kms

    $ apt-cache policy xserver-xorg-video-radeon | grep "Installed"
      Installed: 1:6.13.99+git101203.f9bbb26

I am running my 2.6.37 + cherry-picks kernel with the radeon driver from the last commit before pageflip support was added.  I get no glitches in 'prboom-plus', and all my test software works just fine.

I hope these results are meaningful.  I don't know how to proceed further, so if I could get some guidance I might be able to help pin the problem down even further.

Comment 7 Alex Deucher 2011-02-05 11:29:37 UTC

When you bisect, you want to focus on only the specific bug you are tracking.  If you hit some other bug or the status of the commit is indeterminate, don't mark the commit as bad, skip it (git bisect skip).  Also, if you may need to have a certain patch applied all the time for testing certain things.  E.g., if you get hangs without 1e644d6dce366a7bae22484f60133b61ba322911 applied, make a patch of that commit and manually re-apply it before testing each commit in the bisect.  E.g., create a patch from the commit:
git show 1e644d6dce366a7bae22484f60133b61ba322911 > fix.patch
then before each test in the bisect, manually apply the patch:
patch -p1 -i fix.patch

Comment 8 Dave Witbrodt 2011-02-05 11:56:24 UTC

(In reply to comment #7)
> When you bisect, you want to focus on only the specific bug you are tracking. 
> If you hit some other bug or the status of the commit is indeterminate, don't
> mark the commit as bad, skip it (git bisect skip).  Also, if you may need to
> have a certain patch applied all the time for testing certain things.  E.g., if
> you get hangs without 1e644d6dce366a7bae22484f60133b61ba322911 applied, make a
> patch of that commit and manually re-apply it before testing each commit in the
> bisect.  E.g., create a patch from the commit:
> git show 1e644d6dce366a7bae22484f60133b61ba322911 > fix.patch
> then before each test in the bisect, manually apply the patch:
> patch -p1 -i fix.patch

Thanks, Alex.  Does this mean you think it would be helpful if I did that bisect again?  Or is it already clear that after the "pageflip" and "TTM" series are finished (both in my cherry-pick kernel and in the drm-fixes bisect) that the kernels work (without locking 'prboom-plus') but cause xf86-video-ati (after f9bbb26) to produce the black melt animation?  It seems like the only thing another bisect would clarify is whether the "pageflip" series or the "TTM" series causes the problem; combined with using an older, pre-pageflip radeon driver, it seems like the problem is narrowed down to pageflip code... either in DRM or in xf86-video-ati.


Anyway, when I made my 2.6.37 + cherry-picks kernel, I did a cherry-pick on 1e644d6d first.  (I mentioned that in the original report; it was numbered step 3.)

When I bisected drm-fixes, I did have to manually apply the 1e644d6d commit as a patch; indeed, I had tested that patch in another bug report here in January, and it still has the file name you provided for it:

    0001-drm-radeon-kms-re-emit-full-context-state-for-evergr.patch

DW

Comment 9 Mario Kleiner 2011-02-05 14:15:29 UTC

Looking at your description and the "look-alike bug" youtube video, i wonder if it could be some synchronization bug in mesa, the ddx or a running desktop compositor instead of a kernel bug. Something for which pageflipping is a neccessary condition to show up.

This...

<https://www.crowproductions.de/repos/prboom/branches/prboom-plus-24/prboom2/src/gl_wipe.c>

...i believe is the code responsible for the wipe effect which goes wrong. They take a screenshot of the start screen before the transition into a texture and another screenshot of the end screen after the transition. Then they go through a loop were they draw textured quads, first the fullscreen start-screen texture, then little stripes with bits of the end screen texture.

If the wipe_scr_start_tex texture would contain all-black, you'd get the visual artifacts you describe. That could be because they are capturing an all-black framebuffer instead of the proper one, e.g., because mesa fails to synchronize its framebuffer readback into textures properly with the pageflip, or because it reads from the wrong buffer (pre-pageflip vs. post-pageflip).

Without pageflipping (enabled), there isn't any buffer exchange between front- and backbuffer, so a possible bug in mesa would probably stay hidden.

I do remember that we had to fix some such bugs in the mesa classic driver, also for the framebuffer readback path, i don't know about the status of the gallium version.

You could disable page-flipping via the xorg.conf option "EnablePageFlip" "off" (iirc) and see if that "fixes" the problem without removing any patches. Or if disabling desktop composition makes a difference. One could also mess with that file to see if something changes. E.g., adding a glFlush() or glFinish() or some wait for a few hundred msecs before executing the screenshot makes a difference. Or just display the screenshot texture for a few seconds to see if it is indeed a black texture.

-mario

Comment 10 Dave Airlie 2011-02-05 15:43:19 UTC

probably a bug in the game, most likely reading from the back buffer after doing a swapbuffers.

Comment 11 Dave Witbrodt 2011-02-05 16:02:25 UTC

(In reply to comment #9)
> Looking at your description and the "look-alike bug" youtube video, i
> wonder if it could be some synchronization bug in mesa, the ddx or a 
> running desktop compositor instead of a kernel bug. Something for which
> pageflipping is a neccessary condition to show up.

In other words, because pageflipping is a feature that never existed before,
pre-existing code may simply fail to produce identical results.  There may not
be anything "wrong" at all, other than the cosmetic change in behavior.

FTR, I am not using a compositing window manager at the moment.  Actually,
strike that:  Xfce's WM, xfdesktop4, supports compositing features... but I
have always had them disabled.


> This...
> 
> <https://www.crowproductions.de/repos/prboom/branches/prboom-plus-24/
>  prboom2/src/gl_wipe.c>
> 
> ...i believe is the code responsible for the wipe effect which goes 
> wrong. They take a screenshot of the start screen before the transition
> into a texture and another screenshot of the end screen after the 
> transition. Then they go through a loop were they draw textured quads, 
> first the fullscreen start-screen texture, then little stripes with bits
> of the end screen texture.

This file matches the corresponding file of the source I downloaded, built, and
installed on my system.

I have not (yet) done any GL programming.  Can you tell me whether you see a
difference between the way the effect is coded at the beginning of the game and
when a killed player is resurrected?  The reason I ask is this:  the
wipes/fades/melts at the beginning of the game and in the built-in demo have
always worked, in any combination of kernel, DDX, and Mesa I have tried.  Only
the wipe/melt effect after hitting the space bar to start over triggers
problems:  all-black wipe/melt transitions, or game hangs with kernels built at
certain commits (where no kernel should probably be built anyway).

If that is the only function used to do the wipe/melt transitions, why does it
work on some calls but not on others?


> If the wipe_scr_start_tex texture would contain all-black, you'd get the
> visual artifacts you describe. That could be because they are capturing 
> an all-black framebuffer instead of the proper one, e.g., because mesa 
> fails to synchronize its framebuffer readback into textures properly 
> with the pageflip, or because it reads from the wrong buffer (pre-pageflip
> vs. post-pageflip).
>
> Without pageflipping (enabled), there isn't any buffer exchange between front-
> and backbuffer, so a possible bug in mesa would probably stay hidden.
> 
> I do remember that we had to fix some such bugs in the mesa classic driver,
> also for the framebuffer readback path, i don't know about the status of the
> gallium version.

If a Mesa sync problem is actually to blame, instead of a kernel bug or DDX
bug, then why is it so deterministic in behavior?  What I mean is, the
post-death wipe never succeeds, and the game-demo and game-start wipe never
fails.

BTW, if you folks discover that this is no kernel or DDX bug, then I'm
satisfied to have this classified as a wishlist bug:  no other aspect of the
game is affected, and, as can be see in that YouTube clip, other clones have
simply implemented the wipe effect in all black anyway.  My main concern was
that something more serious was going wrong underneath, possibly a clue to
other bug reports over the past few days.



> You could disable page-flipping via the xorg.conf option
> "EnablePageFlip" "off" (iirc) and see if that "fixes" the problem without
> removing any patches. Or if disabling desktop composition makes a 
> difference.  One could also mess with that file to see if something 
> changes. E.g., adding a glFlush() or glFinish() or some wait for a few
> hundred msecs before executing the screenshot makes a difference. Or just
> display the screenshot texture for a few seconds to see if it is indeed a
> black texture.
> 
> -mario

1.  xorg.conf option:  I'll give it a try later
2.  desktop compositor:  disabled throughout
3.  experiments with gl_wipe.c:  having no experience with OpenGL coding, I am
not immediately able to perform these experiments.  I could consider this my
big chance to starting learning... but I'm not sure how quickly I can figure
out how to code those experiments.  If the code is quick and easy for you to
write, I could apply patches and test them!

Allow me to point out again (see Comments 2 and 6) that I am getting a DRM
error message which also (at least superficially) points us at kernel commit
6f34be50:

[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35!

Admittedly, that message could be unrelated to the glitch I am seeing, but I've
got to admit to being tempted to believe there's a connection....

Comment 12 Dave Witbrodt 2011-02-05 16:24:47 UTC

Created attachment 42981 [details]
xorg.conf with Option "EnablePageFlip" "off"

Taking Mario Kleiner's advice, I added this to xorg.conf:

    Option "EnablePageFlip" "off"

(I also am attaching the file, as a sanity check; I don't believe I have anything nefarious there, but other eyes may see what mine do not.)

I can report that the all-black wipe/fade issue disappears with the combination of my 2.6.37+cherry-picks kernel and xf86-video-ati 6.14.  Also, the DRM error msg

    *ERROR* Failed to parse relocation -35!

goes away.


HTH,
Dave W.

Comment 13 Dave Witbrodt 2011-02-05 16:33:10 UTC

Here's an observation I missed:

After the black wipe/melt effect, there is a pause -- several hundred millisecs, maybe a half second at the most.  This does not happen when the wipe happens correctly.

Sorry about not noticing that sooner.


DW

Comment 14 Dave Witbrodt 2011-04-11 19:59:28 UTC

I did no further testing after the onset of the Japan crisis, but I have just
started updating some relevant parts of my system.

Updating from kernel 2.6.38-rc8 to 2.6.38.2 did not seem to have an effect, but updating Mesa to 7.11.0-devel (commit a26121f3) changed the observed behavior in prboom-plus.  The black melts are gone, replaced by transient graphics corruption when starting a new game.  This corruption fades within a span of about 1 second, and causes no further problems.

In short, it looks like the problem was a combination of kernel changes and Mesa support for my Radeon HD 5750.  Apparently some changes to Mesa over the past 7 weeks have resolved the problem I was originally reporting here.

Comment 15 Alex Deucher 2011-04-14 10:56:48 UTC

possibly a dupe of bug 35452

Comment 16 Dave Witbrodt 2012-07-22 12:58:52 UTC

I have been tracking this bug since April 2011, and it is finally fixed.  The original problem was an all-black "melting" effect at the beginning of a game
of DOOM (using the prboom-plus client); as of comment 14, the melt effect was working correctly, but 3D glitches were occurring:  at the beginning of the game walls and floors would be invisible or would blink, but the bug would disappear after a second or two into the game.  (Actually, any change of the player's "height" in the virtual corld would trigger a few moment's worth of the bug.)

The fix happened either in the DRM or Mesa.  I went from a 3.4.5 kernel to 3.4.6, with some cherry-picks from drm-airlied in both; and with Mesa 8.1-devel I went from commit e3ff4d4c to commit e2e7b467.  None of the radeon-related cherry picks in drm-next that I used (between 74da01dc - 197bbb3d) for my kernel 3.4.6 update look relevant.  Promising candidates from Mesa include:

    commit 018e3f75d69490598d61059ece56d379867f3995
    Author: Marek Olšák <maraeo@gmail.com>
    Date:   Sun Jul 15 00:02:42 2012 +0200

        r600g: fix all failing depth-stencil tests for evergreen

    commit ba48f47ebf7f017db0507b92a3ca83e404dc586c
    Author: Marek Olšák <maraeo@gmail.com>
    Date:   Sat Jul 14 16:23:42 2012 +0200

        r600g: consolidate code for setting sampler views and fix bugs in
        the process

    commit 80755ff56317446a8c89e611edc1fdf320d6779b
    Author: Marek Olšák <maraeo@gmail.com>
    Date:   Sat Jul 14 17:06:27 2012 +0200

        r600g: properly track which textures are depth

At any rate, this minor annoyance is now completely gone.  Thanks to everyone working on the open source Radeon support!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.