Using drm-intel-next and DRM_MODE_OVERLAY_LANDED in intel driver with all master versions of everything. This is G965 system. After doing a s2ram cycle, the overlay YUV offsets will be wrong. I will see a b/w picture with series of blue/red rectanges, and occasional flashes of green, like on broken TV. It doesn't matter if overlay was or wasn't used before suspend. Note that if overlay was used, it has to be running while doing the suspend, otherwise system will hang on resume, this is separate bug.
On Sat, Nov 07, 2009 at 11:33:30AM -0800, bugzilla-daemon@freedesktop.org wrote: > Using drm-intel-next and DRM_MODE_OVERLAY_LANDED in intel driver with all > master versions of everything. > This is G965 system. > > After doing a s2ram cycle, the overlay YUV offsets will be wrong. > I will see a b/w picture with series of blue/red rectanges, and occasional > flashes of green, like on broken TV. > > It doesn't matter if overlay was or wasn't used before suspend. > > Note that if overlay was used, it has to be running while doing the suspend, > otherwise system will hang on resume, this is separate bug. Does this disappear when you resize the window like when using the overlay for the first time, too? I suspect this is the same problem as the overlay-is-green one in disguise. -Daniel
No, resizing the window doesn't help at all. I also notice that bursts of green (several lines momentally turn green) occur when rendering happens in other areas of the screen (like text output in console)
I did some research on this one. First of all, registers are exactly same before and after suspend cycle. I had written a program that dumps all registers from mmio range and from gart mapped overlay page. Secondary, I understand now the garbaged output much better. This U/V layers aren't shifted like I thought. What happens is that of three layers (YUV) some are missing in rectagular areas that are scattered over the overlay window. Places where both U and V are missing are gray, places that miss one of U,V are red/blue, etc... also I noticed that if I pause the video, still the pattern dosn't halt, but changes dynamically. Also if I move the window fast enough, I could see the corect picture for a split second. It looks like overlay hardware is starved on memory access, isn't it? Looking thorough the source, I now understand that all register access happens through gart-mapped page, except its address that is send through MI_OVERLAY_FLIP. and gamma correction registers that are written directly.
On Fri, Nov 13, 2009 at 02:18:23AM -0800, bugzilla-daemon@freedesktop.org wrote: > I did some research on this one. > > First of all, registers are exactly same before and after suspend cycle. > I had written a program that dumps all registers from mmio range and from gart > mapped overlay page. > > Secondary, I understand now the garbaged output much better. > This U/V layers aren't shifted like I thought. What happens is that of three > layers (YUV) some are missing in rectagular areas that are scattered over the > overlay window. Places where both U and V are missing are gray, places that > miss one of U,V are red/blue, etc... > > also I noticed that if I pause the video, still the pattern dosn't halt, but > changes dynamically. Are you always seeing uniform colours (in one rectangular area) or is it sometimes somewhat noisy? > Also if I move the window fast enough, I could see the corect picture for a > split second. > > It looks like overlay hardware is starved on memory access, isn't it? Maybe, but if this happens, you should see cacheline-aligned pieces of _lines_ with the wrong color, and not rectangular blocks somewhere in the overlaid image. I suspect there's a memory-barrier missing for the gart-mapped overlay regs. 965 works different there than all previous chips. Unfortunately I haven't had time to cook up a debug patch to check this theory, but I'll do so rsn. > Looking thorough the source, I now understand that all register access happens > through gart-mapped page, except its address that is send through > MI_OVERLAY_FLIP. > and gamma correction registers that are written directly. Yep, that's correct.
Are you always seeing uniform colours (in one rectangular area) or is it sometimes somewhat noisy? I usually see one component of three. This is ether gray red or blue area that has correct brightness levels as in original picture. So yes, colors aren't uniform. > Also if I move the window fast enough, I could see the corect picture for a > split second. > > It looks like overlay hardware is starved on memory access, isn't it? Maybe, but if this happens, you should see cacheline-aligned pieces of _lines_ with the wrong color, and not rectangular blocks somewhere in the overlaid image. I suspect there's a memory-barrier missing for the gart-mapped overlay regs. 965 works different there than all previous chips. Unfortunately I haven't had time to cook up a debug patch to check this theory, but I'll do so rsn. Hard to understand you, and probably hard for me to explain the output I see. It like checkboard pattern a bit, very irregullar, and changing over time. > Looking thorough the source, I now understand that all register access happens > through gart-mapped page, except its address that is send through > MI_OVERLAY_FLIP. > and gamma correction registers that are written directly. Yep, that's correct.
Created attachment 31567 [details] picture of the corruption Don't have a real camera near me now, so this is a picture taken by webcam.
Also I must note that: every area has correct brightness, but one of components (or several) are missing. When graphical output occurs, the pattern changes. If I start a 3d application, the pattern begins to change rapidly, and some areas briefly show correct color, also by moving the window, its possible to see correct colors for a split second.
The picture is very interesting and definitely looks like cache-line sized blocks (again an example of a picture's worth more than a thousand words ...). Can you please count how many pixels _wide_ these blocks are? [Just watch a video at 1:1 resolution, count how many blocks you have and then divide the horizontal size of the video by this. It should yield a nice power-of-two] Also, please attach /proc/cpuinfo (so that I know what's the size of your cpu-cachelines).
Another quick question, just to check: When you stop the video (an have a 3d app running alongside), do the colors keep changing forever or do they settle to something specific after a while (half a minute should do)? If they settle to something specific, please take a picture of that, too (save when everything is fine, of course). -Daniel
I did the tests. First of all, it doesn't matter if video is moving or paused. Then any grapichal output will affect the pattern. Rapid output like I said makes it look like noise, but still horisontaly it is perfectly aligned. In fact visablity of the windows doesn't matter ether, this is if I minimize the 3d game, the effect is same. Aligment is just like you suspected: First rectangle is 64 pixels wide, all following are 32 pixel wide. Measured with gimp, by placing its window side by side. I measured length of 16 blocks and got 512 pixels, and maximum +=4 pixels error. I have core2 duo processor: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz stepping : 6 cpu MHz : 1596.000 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow bogomips : 4262.88 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Also zooming in (I use totem) does affect the block size. When zooming in near maximum zoom, wide green rows appear in between the lines Same happens if I put overlay partially offscreen.
I've thought about this and it doesn't look like the problem is cache flushing related. Reasons: - When you stop the video, the image doesn't stabilize. If there is some unflushed (on the cpu) or stale (on the gpu) data in caches, these bad blocks would slowly disappear (faster under load). - The blocks are 32 pixels wide, i.e. 16 bytes in the U or V plane (UV are subsampled). Your cpu's cacheline size is 64. So it doesn't look like it's the cpu cache messing with the image. - It might still be the gpu/gtt/agp cache. IIRC intel uses 32/16 byte cachelines there (I'd have to look that up). But the UV planes are in new bo's which have not yet been used by the gpu. So it's unlikely that the gpu caches contain so much stale data. So I think someone is writing crap over the video image. This is supported by the fact that when you move around the window like crazy, you can see a correct frame. Moving around windows like crazy usually creates quite some load, i.e. this may slow down whatever is writing into the video image. At least slow it down enough so that you're able to see a correct frame for a split second. I have a new idea for a debug patch, hope to code and test it today. -Daniel PS: If you think any of the observations and conclusions in this small summary are wrong, please point it out.
Created attachment 31650 [details] [review] patch against xf86-video-intel Can you quickly check whether this patch changes anything?
No change at all
As you might have guessed from my silence, I'm running out of ideas ... Just to gather some more information, can you please post your output config (just send the output from xrandr) you your Xorg.log? Doesn't really mather which driver version. Meanwhile I'll try to cook up new ideas to test. -Daniel
Can you also send your .config from the kernel?
Created attachment 32304 [details] xrandr output
Created attachment 32306 [details] xorg log
Created attachment 32307 [details] kernel config
Hi, Sorry for delays. I attached all information you asked for, although I don't think there is anything much useful. Note that I recently found out that suspend to disk cycle, brings the GPU to same state as on boot, that is green window of first run, then normal video. suspend to ram cycle shows same problem again.
> --- Comment #19 from maximlevitsky@gmail.com 2009-12-26 14:28:05 PST --- > Hi, > > Sorry for delays. > I attached all information you asked for, although I don't think there is > anything much useful. Thanks. I've looked through it but found nothing suspicious (or that could be related to other bug reports). > Note that I recently found out that suspend to disk cycle, brings the GPU to > same state as on boot, that is green window of first run, then normal video. > suspend to ram cycle shows same problem again. Maybe initializing the gpu by the bios changes something. Dunno. atm I'm hunting down cache flushing bugs, which might be related to your problem. I'll postpone your report here until I've tracked down all the issues I'm seeing (still not done, but hopefully getting there). -Daniel
I have some good and bad news. I updated both kernel and GFX stack to latest versions. The bad news are that now 3D is completely hosed, all 3d applications ether don't start (show complain about failed DRI2 request) or display black window. On the other hand both green overlay and garbage after resume is gone. Overlay just works (and I did see that it is enabled by doing xvinfo, and it is preferred one) I then booted old kernel, and overlay issues come back. Thus I suspect that this was accidentally fixed in kernel. Will compile again old GFX stack + new kernel to see if that is true. Best regards, Maxim Levitsky
> --- Comment #21 from maximlevitsky@gmail.com 2010-01-15 16:05:55 PST --- > I have some good and bad news. > > I updated both kernel and GFX stack to latest versions. > > The bad news are that now 3D is completely hosed, all 3d applications ether > don't start (show complain about failed DRI2 request) or display black window. > > On the other hand both green overlay and garbage after resume is gone. Overlay > just works (and I did see that it is enabled by doing xvinfo, and it is > preferred one) > > I then booted old kernel, and overlay issues come back. Thus I suspect that > this was accidentally fixed in kernel. That's really interesting. > Will compile again old GFX stack + new kernel to see if that is true. If this is true, can you please post the exact git revisions of the first good and the last bad kernel. Perhaps I get a clue as to what's the problem. Thanks, Daniel btw: I'll be mostly offline for 2 weeks now, so expect some latencies.
Yep, I downgraded mesa and xserver, 3D is fine. Overlay is displayed correctly after suspend to ram. On the other hand, I do see the green window on first run, I just didn't notice this. I will for sure bisect to find what fixed that bug. This is very important, because otherwise it can surface again. Its a bit weird though, to mark good commits as bad and versa versa....
Any news on your bisection? I'd really like to know what fixed your problem. -Daniel
Really sorry. I will do the bisection really soon.
Assuming the bisection will help Daniel fix this issue quickly.
Just two funny things about git: Bisecting: 0 revisions left to test after this (roughly 0 steps) e8b60faea972604c315634cff62d44803731ea9 is first bad commit commit 7e8b60faea972604c315634cff62d44803731ea9 Author: Andrew Lutomirski <luto@mit.edu> Date: Sun Nov 8 13:49:51 2009 -0500 drm/i915: restore render clock gating on resume Rather than restoring just a few clock gating registers on resume, just reinitialize the whole thing. Signed-off-by: Andy Lutomirski <luto@mit.edu> [anholt: Fixed up for RC6 support landed since the patch was written] Signed-off-by: Eric Anholt <eric@anholt.net> OK, now seriously. I bisected fix for this bug, and e8b60faea972604c315634cff62d44803731ea9 is the fix.
7e8b60faea972604c315634cff62d44803731ea9 of course
> --- Comment #27 from maximlevitsky@gmail.com 2010-02-06 12:05:24 PST --- > Just two funny things about git: > > Bisecting: 0 revisions left to test after this (roughly 0 steps) > > e8b60faea972604c315634cff62d44803731ea9 is first bad commit > commit 7e8b60faea972604c315634cff62d44803731ea9 > Author: Andrew Lutomirski <luto@mit.edu> > Date: Sun Nov 8 13:49:51 2009 -0500 > > drm/i915: restore render clock gating on resume > > Rather than restoring just a few clock gating registers on resume, > just reinitialize the whole thing. > > Signed-off-by: Andy Lutomirski <luto@mit.edu> > [anholt: Fixed up for RC6 support landed since the patch was written] > Signed-off-by: Eric Anholt <eric@anholt.net> Thanks alot for bisecting this, this make some sense as a fix. I would never have come up with such an idea, so there was definitely some decen amount of luck involved ;)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.