Bug 24977 - [965G] First run of the overlay displays green window
Summary: [965G] First run of the overlay displays green window
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: low normal
Assignee: Daniel Vetter
QA Contact:
URL:
Whiteboard:
Keywords:
: 27447 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-11-07 11:28 UTC by maximlevitsky
Modified: 2017-07-24 23:09 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
overlay debug patch to check register update consistency (2.10 KB, patch)
2009-11-25 07:33 UTC, Daniel Vetter
no flags Details | Splinter Review
overlay debug tool (8.30 KB, patch)
2009-11-25 07:52 UTC, maximlevitsky
no flags Details | Splinter Review
kernel log (100.30 KB, text/plain)
2009-11-28 08:49 UTC, maximlevitsky
no flags Details
kernel patch against drm-intel-next (1.71 KB, patch)
2009-11-29 05:45 UTC, Daniel Vetter
no flags Details | Splinter Review
disable overlay clock gating (1.29 KB, patch)
2010-02-07 03:30 UTC, Daniel Vetter
no flags Details | Splinter Review
intel_reg_dump (83.92 KB, text/plain)
2010-05-28 05:06 UTC, Thomas Lindroth
no flags Details
report contents of clock gating regs (720 bytes, patch)
2010-08-07 08:23 UTC, Daniel Vetter
no flags Details | Splinter Review

Description maximlevitsky 2009-11-07 11:28:10 UTC
Using drm-intel-next and DRM_MODE_OVERLAY_LANDED in intel driver with all master versions of everything.

When I start the overlay for first time I see green window.
I verified that its not the colorkey issue, but the actual contents of the window.
This was tested by moving the mplayer window fast (and then I see some of blue colorkey), and by using compiz with transformed screen.

Tested with and without compiz.

Any attempt to resize/reopen the window fixes the issue permanently (till next boot of course)

Was using:

$ mplayer .video.avi -vo xv:port=94
Comment 1 maximlevitsky 2009-11-07 11:34:03 UTC
This is G965 system
Comment 2 Daniel Vetter 2009-11-08 10:55:48 UTC
Thanks for all the bug reports ;)

Let's start with this one. First a few small things to clarify what's
going on:

- You see the green always on first use, even after having moved around
  the window? And as soon as you either resize or hide/show the window, it
  is permanently fixed?

- Can you try with the following option in the device section of your
  xorg.conf:

Option "XvPreferOverlay" "true"

  And retest without specifying a xv port?

- Can you please try a different video player like vlc.

-Daniel
Comment 3 maximlevitsky 2009-11-10 05:29:15 UTC
- You see the green always on first use, even after having moved around
  the window? And as soon as you either resize or hide/show the window, it
  is permanently fixed?

Absolutely true,  even if I do a suspend/resume cycle first.




- Can you try with the following option in the device section of your
  xorg.conf:

Option "XvPreferOverlay" "true"

  And retest without specifying a xv port?

Well, I added that option....
I dislike broken suspend if I play video ever once...
I set it now to see how totem and vlc is going though



- Can you please try a different video player like vlc.

Yep, totem shows same problem, vlc however not, 
Probably it hides/shows the window though.

Comment 4 Daniel Vetter 2009-11-25 07:33:42 UTC
Created attachment 31474 [details] [review]
overlay debug patch to check register update consistency

I've finally come around to create a debug patch. Can you apply this on top of the latest drm-intel-next kernel and post the resulting dmesg after a few seconds of using the overlay (not more, if it's broken like I suspect this will overflow your dmesg buffer pretty fast).

Thx, Daniel
Comment 5 maximlevitsky 2009-11-25 07:51:21 UTC
Sure thing, but I am afraid this won't work

I have wriiten small tool to read/write overlay range and everything seems to be fine there.

I have a feeling that problem lies within actual image buffers. Their map in the gart might be broken.

I attach the patch against intel_gpu_tools.
Feel free to add my program there.
Comment 6 maximlevitsky 2009-11-25 07:52:54 UTC
Created attachment 31475 [details] [review]
overlay debug tool
Comment 7 Daniel Vetter 2009-11-25 08:33:02 UTC
> --- Comment #5 from maximlevitsky@gmail.com  2009-11-25 07:51:21 PST ---
> Sure thing, but I am afraid this won't work
> 
> I have wriiten small tool to read/write overlay range and everything seems to
> be fine there.

I'm not really convinced. Furthermore my kernel patch checks the
consistency after every frame update (and without somebody else
interfering). And if this patch finds nothing, we are guaranteed that this
part works as intended so that I can go down chasing more unlikely
options.

> I have a feeling that problem lies within actual image buffers. Their map in
> the gart might be broken.

Quite unlikely - the overlay code uses standard functions to map bo's (and
the hardware has no special path to reach them). If there's something
broken in there, you should see it everywhere else, too.

Furthermore the corruptions you described didn't sound like cache flushing
problems of the image data in my ears. Just to clarify this: Could you
perhaps take a few jpegs (with a camera) of the funny corruptions you see
after a suspend cylce? So that I know _exactly_ what you're talking about.

Thx, Daniel

Comment 8 maximlevitsky 2009-11-28 08:47:45 UTC
And you was right, thanks.

I attach the kernel log.

I turned overlay one/off few times, then turned it off, did s2ram cycle.
After the s2ram cycle, I see more inconsistent registers.

Comment 9 maximlevitsky 2009-11-28 08:49:15 UTC
Created attachment 31538 [details]
kernel log
Comment 10 maximlevitsky 2009-11-28 11:22:11 UTC
I have limited (read none) experience on issues caused by caching, but the gart is not mapped UC, right?

In this case, maybe although page was programmed, it still in cpu memory, which makes the GPU load garbage?

After few cycles caches are flushed, but overlay get already confused.

Comment 11 Daniel Vetter 2009-11-29 05:34:28 UTC
On Sat, Nov 28, 2009 at 11:22:16AM -0800, bugzilla-daemon@freedesktop.org wrote:
> --- Comment #10 from maximlevitsky@gmail.com  2009-11-28 11:22:11 PST ---
> I have limited (read none) experience on issues caused by caching, but the gart
> is not mapped UC, right?

The GART is mapped WC.

> In this case, maybe although page was programmed, it still in cpu memory, which
> makes the GPU load garbage?
At least for the GART mapped overlay regs, this should not happen, because
I flush the wc caches of the cpu (grep for "flush wc" in intel_overlay.c).
But perhaps some caches at the _gpu_ are not flushed properly and contain
stale data.

I've looked at the dmesg and the inconsistent regs after startup and right
after resume are expected. This leaves us with one inconsistency (mapping
4). It looks like this is the very first frame. Might be that this
confuses the overlay. But this does not explain why it does not come up
after a suspend cycle. So my gut tells me that the problems is somewhere
else.

When you test the suspend patch, can you take a few pictures of the
corruptions you see? Perhaps I get an idea about what's broken.

Thanks, Daniel
Comment 12 Daniel Vetter 2009-11-29 05:45:39 UTC
Created attachment 31549 [details] [review]
kernel patch against drm-intel-next

There is a small difference between ums and kms in the overlay gpu flushing.
According to the docs, this doesn't do anything on i965, and testing shows it doesn't seem necessary anywhere else. Still, maybe it changes something somehow. Just test this on top off the other stuff I've posted the next time you compile a new kernel.

-Daniel
Comment 13 maximlevitsky 2009-11-29 17:54:36 UTC
I tested all three patches (test, fix for s2ram, and this).
Other that s2ram works always, rest is same.

However, since this bug and garbage after s2ram probably have same origin, I can say you fixed half of this.

Me will see carefully how your patch fixed the s2ram issue...
Comment 14 Daniel Vetter 2009-11-30 01:44:42 UTC
> --- Comment #13 from maximlevitsky@gmail.com  2009-11-29 17:54:36 PST ---
> However, since this bug and garbage after s2ram probably have same origin, I
> can say you fixed half of this.

I have that suspicion, too. Right after boot-up, there's not much garbage
in your memory, so you see an uniform green, whereas later on (after
resume) you see more colorful stuff.

> Me will see carefully how your patch fixed the s2ram issue...
I've sent patches to intel-gfx, you can look there for the details (and
the explanations of what went wrong).

-Daniel
Comment 15 Daniel Vetter 2010-01-14 04:31:11 UTC
An idea just crossed my mind: What are the dimensions of the video source?
And can you test different resolutions, like from tiny web-videos to HD
resolution.

Thanks, Daniel
Comment 16 Jesse Barnes 2010-02-05 15:11:04 UTC
Assigning to Daniel as this is his area of expertise.
Comment 17 Daniel Vetter 2010-02-07 03:30:27 UTC
Created attachment 33135 [details] [review]
disable overlay clock gating

I've thought a bit about the fix you bisected. Maybe clock gating is also to blame for the
green overlay issue after boot-up/suspend? Please test this patch.

And just to confirm: You have a i965G not a i965GM, right?
Comment 18 maximlevitsky 2010-02-07 09:33:44 UTC
Sure. I have G965, aka desktop motherboard (DG965RY).

I tested your patch, sadly it didn't help.

Must I say that excluding this minor problem overlay works  great now.
Recent fix in intel driver made overlay work perfectly with compiz, even on transformed screen. Sure overlay can't be transformed, but it is clipped perfectly.

Texture video also works perfectly, but since I discovered that overlay works on this chip, I somehow like to use it.....

Comment 19 Daniel Vetter 2010-02-07 10:01:17 UTC
> --- Comment #18 from maximlevitsky@gmail.com  2010-02-07 09:33:44 PST ---
> Sure. I have G965, aka desktop motherboard (DG965RY).
> 
> I tested your patch, sadly it didn't help.

Bad. Just to recap the facts: When first using the overlay right after
boot-up or resume, the video shows a solid green. As soon as you move
around the window, the correct picture shows and the problem never again
shows up.

A few questions to clarify what's going on:
* Does the overlay stay green if you don't do anything?
* Does anything else fix the overlay (like using other apps, moving
  other stuff)? Just be careful no to overlap anything over the overlay,
  for this is essentially the same as moving it.
* Anything else that looks odd?

Thanks, Daniel
Comment 20 maximlevitsky 2010-02-07 12:55:56 UTC
Thanks for asking. It is a bit different now:

The only way to make the overlay window green is to do a fresh boot.
Then I can move it, partially obscure it by other windows. As soon as I minimize it, fully obscure by another window, or move it partially off-screen, overlay starts working, and it continues to work through both hibernate and suspend cycles.

It seems to stay green forever, although I waited maybe 20 seconds or so. I test that again soon.

The above tested without compiz. With compiz everything is same, and can even fully obscure the overlay window without loosing the green color.
(I use very recent intel driver that fixed overlay color issue).

I do remember though that hibernate did bring the green window back, but now it doesn't
Comment 21 Daniel Vetter 2010-02-07 14:48:39 UTC
Something else crossed my mind. Does starting the video, stopping and the
restarting have the same effect? That would be quite easy too implement as
a work-around.
Comment 22 maximlevitsky 2010-02-13 06:17:08 UTC
Nope.

This doesn't affect the green window.

What affects is 

-> the move of the window partially offscreen
-> resize
-> hide/unhide

Comment 23 Thomas Lindroth 2010-03-06 02:31:45 UTC
I just want to tell you that I'm also experiencing this problem now since I upgraded to 2.10.0 and started using the overlay with KMS. The problem is as described. Green video at first until you restart the media player or do any of the other things mentioned here. I using a 965GM. 
Comment 24 Daniel Vetter 2010-03-06 04:02:39 UTC
> --- Comment #23 from Thomas Lindroth <thomas.lindroth@gmail.com>  2010-03-06 02:31:45 PST ---
> I just want to tell you that I'm also experiencing this problem now since I
> upgraded to 2.10.0 and started using the overlay with KMS. The problem is as
> described. Green video at first until you restart the media player or do any of
> the other things mentioned here. I using a 965GM. 

So it looks like that both mobile and desktop variants of 965 are
affected. Can you please take a register dump with the intel_reg_dumper
tool from http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/.

One dump with working ums Xv, one with not-so-well-working kms Xv. I'm now
suspecting that some of the powermanagement stuff prevents the overlay
engine form starting properly. We've already tried some clockgating stuff,
but that didn't help. Perhaps the diff between these dumps gives a clue.

Please take the dump before using the overlay for the first time, because
that seems to make a difference somehow ...

Thanks, Daniel
Comment 25 Daniel Vetter 2010-04-16 01:38:34 UTC
*** Bug 27447 has been marked as a duplicate of this bug. ***
Comment 26 Thomas Lindroth 2010-05-28 05:06:09 UTC
Created attachment 35906 [details]
intel_reg_dump

I finally got around to debugging this. I took some register dumps with and without KMS using the 2.9.1 and 2.11.0 xf86-video-intel driver.

The green screen doesn't show up 100% of the time but the hang will always occur sooner or later.

Before it hangs, the blue colorkey will be displayed and after about 2 sec the xserver hangs. During those 2 seconds mplayer prints a lot of:
X11 error: BadAlloc (insufficient resources for operation)
Comment 27 maximlevitsky 2010-07-03 18:53:50 UTC
Note that this still happens.

Its not such a big deal though
Comment 28 Chris Wilson 2010-07-16 05:01:00 UTC
Reviewing the kernel overlay code, I found a couple of missing cache flushes of the non-phys bo that may be the suspect here.

My untidy fixes are available at http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=fair-eviction
Comment 29 Daniel Vetter 2010-08-07 08:23:27 UTC
Created attachment 37666 [details] [review]
report contents of clock gating regs

I've finally had time to look through the register dumps. Unfortunately noting stood out that could sensibly explain what's going on.

So back to shooting in the dark. The attached patch dumps a few additional regs where the errata's in the documentation don't agree with the code. Might be worth a shot.

Please apply this kernel patch and play around with the overlay (with kms), then attach your full dmesg to this bug.

Thanks, Daniel
Comment 30 Chris Wilson 2010-08-20 05:16:34 UTC
If you find some time, could you please test http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=overlay as it contains some fixes for the overlay, including a singular missing cache flush on the i965 paths.
Comment 31 maximlevitsky 2010-08-27 08:09:27 UTC
Yes, this branch fixes the problem!
Comment 32 Chris Wilson 2010-09-06 11:34:04 UTC
\o/ thanks for testing.

That branch is now that the basis for drm-intel-next which should be merged for 2.6.37.
Comment 33 Chris Wilson 2010-09-22 09:25:28 UTC
Upstreamed as -next.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.