Bug 28252 - Regression: i965: corruption/GPU hang with wide windows.
Summary: Regression: i965: corruption/GPU hang with wide windows.
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Jesse Barnes
QA Contact:
URL:
Whiteboard:
Keywords: NEEDINFO
Depends on:
Blocks:
 
Reported: 2010-05-25 16:37 UTC by Nick Bowler
Modified: 2017-07-24 23:07 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
glxgears: Add option to enable override redirect. (3.48 KB, patch)
2010-05-25 16:37 UTC, Nick Bowler
no flags Details | Splinter Review
Video of the corruption. (970.32 KB, video/x-matroska)
2010-05-26 08:02 UTC, Nick Bowler
no flags Details
Fix exchange validity check (3.46 KB, patch)
2010-06-01 09:45 UTC, Jesse Barnes
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Bowler 2010-05-25 16:37:35 UTC
Created attachment 35858 [details] [review]
glxgears: Add option to enable override redirect.

Displaying an OpenGL window which is wider than the current framebuffer results
in graphical corruption and/or a GPU hang.  I have even seen the screen's DPI
setting magically change as a result, suggesting some more insidious memory
corruption is at play.

To make things more interesting, the issue only occurs if the OpenGL window is
the only thing visible on the entire screen.  For example, to reproduce on a
1680x1050 laptop screen, I run glxgears -geometry 2000x1500 (which works fine),
then reposition the window so that the OpenGL area extends beyond all four
sides of the screen (still working fine) and finally set the window's layer to
be above all other windows (including my panels), at which point the corruption
occurs immediately.

I have attached a patch for glxgears (against mesa/demos.git) which allows one
to enable override_redirect on the window, making it much easier to reproduce
without any window manager intervention: on a 1680x1050 screen, simply run
glxgears -override -geometry 2000x1500 and boom!

I'm using a T500 laptop with a GM45, xserver 1.8.1, git libdrm, git mesa.  Also occurs with mesa 7.8.1.

This is a regression from xf86-video-intel-2.10.0.  Bisection results follow.

1a76fa5574e8e8f88ac3518a4e4494e1af301dc1 is the first bad commit
commit 1a76fa5574e8e8f88ac3518a4e4494e1af301dc1
Author: Keith Packard <keithp@keithp.com>
Date:   Fri Jan 29 23:28:46 2010 -0800

    Initialize DRI2 info rec version 4 list of driver names

    With DRI2 supporting multiple subsystems, the video driver must
    initialize the list of driver names instead of just passing the single
    driver name used by Mesa. Without this, the X server will fail to
    initialize DRI2 as the numDrivers field in this structure will be
    uninitialized.

    Signed-off-by: Keith Packard <keithp@keithp.com>

:040000 040000 649cf372a1096de98cea5bb6f5df159d5b8159d5 49c76ac6b97ed5d3678fbeca3ec5961500bcb9d4 M      src

git bisect start
# bad: [b645ec83e0d86f2247b8338ceab60b9502516e70] uxa: Apply the drawable offset to the solid rects
git bisect bad b645ec83e0d86f2247b8338ceab60b9502516e70
# good: [091035146463bf1aa6674bff6947d04fc620c18f] configure.ac: Bump version to 2.10.0.
git bisect good 091035146463bf1aa6674bff6947d04fc620c18f
# bad: [90a971c60769781f53827b469a9be3aab14cf71c] uxa: Only reduce a composite to a BLT if it is wholly contained
git bisect bad 90a971c60769781f53827b469a9be3aab14cf71c
# bad: [086c0e25cac1d3dd0a37def8b5cb82c1c6279edf] i830_memory: rename i830_bind_all_memory to reflect code reality
git bisect bad 086c0e25cac1d3dd0a37def8b5cb82c1c6279edf
# bad: [a86869e6c3131b83a2ad529bc313270a9f45f5bd] Fix an unused variable warning for !INTEL_XVMC.
git bisect bad a86869e6c3131b83a2ad529bc313270a9f45f5bd
# good: [93cd943d41c646c794b8cb5a960d8f0805e15395] intel: Use the compositing-aware colorkey filler instead of homebrew fail.
git bisect good 93cd943d41c646c794b8cb5a960d8f0805e15395
# bad: [6610bcbac51c9ac970128012f0d4566d8cfba000] DRI2: only use version 4 APIs if kernel support exists
git bisect bad 6610bcbac51c9ac970128012f0d4566d8cfba000
# good: [5f93d019dc6311dd16b6792ffb60dbfc45ef3d08] uxa: Adjust uxa_get_color_for_pixmap to match prototype
git bisect good 5f93d019dc6311dd16b6792ffb60dbfc45ef3d08
# good: [918151a7955c26174db80b775205f6ffb4f44ab6] uxa: Fix compatible_formats() for OVER
git bisect good 918151a7955c26174db80b775205f6ffb4f44ab6
# bad: [1a76fa5574e8e8f88ac3518a4e4494e1af301dc1] Initialize DRI2 info rec version 4 list of driver names
git bisect bad 1a76fa5574e8e8f88ac3518a4e4494e1af301dc1
Comment 1 Nick Bowler 2010-05-25 18:20:32 UTC
(In reply to comment #0)
> I have even seen the screen's DPI setting magically change as a result,
> suggesting some more insidious memory corruption is at play.

Ignore this bit about the DPI.  I just realized that 'xrandr -s ...' changes
DPI (and rightfully so!), and I remembered that I was playing with xrandr
before figuring out that this is reproducible without it.
Comment 2 Chris Wilson 2010-05-26 04:11:04 UTC
Nick, can I ask you to retest? On my gm45 [x200s] using mostly current trees, I don't see any corruption with your patched glxgears.
Comment 3 Nick Bowler 2010-05-26 05:09:41 UTC
(In reply to comment #2)
> Nick, can I ask you to retest? On my gm45 [x200s] using mostly current trees, I
> don't see any corruption with your patched glxgears.

I pulled the latest changes from xf86-video-intel, but the problem still persists.  The corruption I see is quite spectacular, it can't be missed.

However, I discovered that if I run xcompmgr, then my test case works normally.
Comment 4 Nick Bowler 2010-05-26 08:02:36 UTC
Created attachment 35870 [details]
Video of the corruption.

Here's a video of what I see on the screen when the GPU is not hanging.  It's a little shakey as it's filmed with a handheld camera, and it's very low quality due to bugzilla attachment limits.

The GPU typically hangs after I close glxgears.  As usual, a reboot is required to recover.
Comment 5 Chris Wilson 2010-05-29 02:56:27 UTC
Hmm, I wonder if this is possibily related (wild stab, not even sure if we're intelligent enough to enable pageflipping in this case...):

commit 44d45d3fa56f121ce89ffe5b28beb48be01a95df
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat May 29 10:39:28 2010 +0100

    dri: Use size from backing pixmap when creating buffers.
    
    This avoid using the garbage values stored in the Screen drawable,
    instead of the true values which are only maintained in its backing
    pixmap. The consequence of using the wrong size was to hand a 1x1
    pixmap to metacity/mutter and have it believe it was a full screen
    drawable; GPU hangs ensued if using page flipping.
Comment 6 Nick Bowler 2010-05-29 07:32:53 UTC
I realize it's since been reverted, but I just tested that commit.  It doesn't actually solve the problem (GPU still dies), but it does *change* the corruption.  A bit hard to describe...
Comment 7 Nick Bowler 2010-05-31 17:09:20 UTC
Commit e2615cdeef078 ("dri: Only flip if the front and back pixmaps match.")
plus the subsequent compilation fix seems to have fixed the corruption/hang.  

However, these commits have broken vsync for windowed applications:

  % glxgears        
  Running synchronized to the vertical refresh.  The framerate should be
  approximately the same as the monitor refresh rate.
  3270 frames in 5.0 seconds = 653.891 FPS
  3381 frames in 5.0 seconds = 676.111 FPS
  3243 frames in 5.0 seconds = 648.545 FPS
Comment 8 Jesse Barnes 2010-06-01 09:28:31 UTC
Which commit broke vsync?  Do you have any X log messages indicating a problem getting vblank counts?
Comment 9 Jesse Barnes 2010-06-01 09:32:30 UTC
ah yeah, looks like that last commit will prevent swaps from working correctly even if we end up not flipping...
Comment 10 Jesse Barnes 2010-06-01 09:45:02 UTC
Created attachment 35993 [details] [review]
Fix exchange validity check

This fix really belongs in the server I think, but it should fix vsync while keeping the flipping fix Chris found.
Comment 11 Nick Bowler 2010-06-01 10:26:57 UTC
Yup, with that patch vsync works again and the corruption is gone.
Comment 12 Jesse Barnes 2010-06-01 13:49:08 UTC
Fix committed, thanks for testing.

commit f2272402035574c206a0e3383c55373c440fd928
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue Jun 1 13:46:15 2010 -0700

    DRI2: fix new buffer exchange check


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.