Bug 71588 - haswell UXA corruption when unblanking from "xset dpms force off"
Summary: haswell UXA corruption when unblanking from "xset dpms force off"
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-13 21:26 UTC by Ray Strode [halfline]
Modified: 2017-07-24 22:56 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
reg dump (40.91 KB, text/plain)
2013-11-13 21:26 UTC, Ray Strode [halfline]
no flags Details
vbios (64.00 KB, text/plain)
2013-11-13 21:26 UTC, Ray Strode [halfline]
no flags Details
dmesg (130.13 KB, text/plain)
2013-11-13 21:26 UTC, Ray Strode [halfline]
no flags Details
picture of monitor (834.17 KB, image/jpeg)
2013-11-13 21:28 UTC, Ray Strode [halfline]
no flags Details
another case of corruption (147.52 KB, image/jpeg)
2013-11-13 21:30 UTC, Ray Strode [halfline]
no flags Details
i915_gem_gtt (30.38 KB, text/plain)
2013-11-13 21:33 UTC, Ray Strode [halfline]
no flags Details
Xorg.0.log (62.46 KB, text/plain)
2013-11-13 22:18 UTC, ryanlerch
no flags Details
another (different) form of corruption (1.68 MB, image/jpeg)
2013-11-14 14:40 UTC, ryanlerch
no flags Details

Description Ray Strode [halfline] 2013-11-13 21:26:08 UTC
Created attachment 89163 [details]
reg dump

Ryan has a swanky new Haswell workstation and when he runs xset dpms force off (or unblanks the lock screen in gnome) he sees screen corruption.  The output looks like a sometimes mangled view into previous screen contents.  When a fullscreen redraw is forced then it corrects itself. (will attach picture).

we've tried disabling fbc and powersave on the kernel command line and that doesn't help.

will attach dmesg, vbios, and reg dump.  The latter was done once before reproducing, once immediately after forcing screen blanked, once a second later, and once after we'd refreshed the screen.
Comment 1 Ray Strode [halfline] 2013-11-13 21:26:25 UTC
Created attachment 89164 [details]
vbios
Comment 2 Ray Strode [halfline] 2013-11-13 21:26:39 UTC
Created attachment 89165 [details]
dmesg
Comment 3 Ray Strode [halfline] 2013-11-13 21:28:53 UTC
Created attachment 89166 [details]
picture of monitor

notice it is offset a bit from the top
Comment 4 Ray Strode [halfline] 2013-11-13 21:30:03 UTC
Created attachment 89167 [details]
another case of corruption
Comment 5 Ray Strode [halfline] 2013-11-13 21:31:24 UTC
once the screen wasn't offset at all but was showing different shades of blue instead of window contents (maybe the pick buffer?) and sometimes it seemingly shows old contents but no obvious corruption. (like the terminal will show up without the command that was typed to blank the monitor)
Comment 6 Ray Strode [halfline] 2013-11-13 21:33:06 UTC
Created attachment 89168 [details]
i915_gem_gtt
Comment 7 Ray Strode [halfline] 2013-11-13 21:36:05 UTC
forgot to mention...only happens with UXA, not SNA
Comment 8 Ray Strode [halfline] 2013-11-13 21:41:30 UTC
    irc log:

    <halfline> hey, ryanlerch just grabbed me to show me a weird graphics corruption bug in the gnome-shell lock screen
    <halfline> after some finagling we figured out the problem happens when a monitor is dpms off'd and comes back
    <halfline> if he does xset dpms force off then when he comes back the screen contents are stale and or partially corrupted
    <halfline> maybe it's showing the backbuffer ? not 100% sure
    <halfline> this is with UXA
    <halfline> does that ring any bells for you guys?
    <halfline> problem notably doesn't happen with SNA or the modesetting ddx
    <marcheu> halfline: framebuffer compression?
    <halfline> marcheu: are you asking if fbc is enabled ? 
    <marcheu> halfline: yeah, I had a bug like that with bad interaction between fbc and blitter (which is only used in UXA)
    <marcheu> halfline: if so, bwidawsk send patches to fix it
    <marcheu> a month ago maybe?
    <marcheu> s/send/sent
    <halfline> marcheu: ah interesting, let me walk over and try forcefully disabling it
    <halfline> and see if that fixes the issue
    --- ryanlerch is now known as halfryan
    <halfryan> marcheu, booting with i915.i915_enable_fbc=0 doesn't fix the issue
    <halfryan> it happens on two haswell machines
    <marcheu> oh well :)
    --- halfryan is now known as ryanlerch
    <halfline> bwidawsk: any insight ?
    <bwidawsk> halfline: image
    <halfline> are you asking for a screenshot? sometimes it looks corrupted like this: http://i.imgur.com/6KPm4Gm.jpg with the screen contents offset
    <halfline> bwidawsk: but sometimes it looks perfectly fine, just "stale"
    <bwidawsk> can we disable stolen garbage?
    <bwidawsk> jbarnes: ?
    <bwidawsk> halfline: can you paste the contents of i915_gem_gtt in debugfs
    <halfline> bwidawsk: like if we run a command, in the terminal to xset dpms force off, we'll see the command on screen for a brief second before the monitors turn off
    <halfline> but when they come back on the screen contents will show the terminal before the command was typed
    <halfline> bwidawsk: after triggering the problem ?
    <halfline> yea one sec, let me walk back over
    <ryanlerch> bwidawsk, http://paste.fedoraproject.org/53828/38437139/
    <bwidawsk> ryanlerch: i don't see an easy way to disable stolen, but that's what my gut is telling me is screwed up
    <ryanlerch> one more copy of the file on a different machine, while the corruption is in progress: http://ur1.ca/g141g
    <ryanlerch> bwidawsk, what is "stolen" ?
    <ryanlerch> ^ that was halfline on my behalf
    <bwidawsk> just some special memory which has extra odds of being corrupted
    <halfline> bwidawsk: and do textures get migrated to this memory when the monitor is dpms off'd ?
    <bwidawsk> halfline: no
    <bwidawsk> but when things go off is when things like to write to stolen
    <bwidawsk> jbarnes, ickle: any simple way to disable stolen use
    <halfline> ah so theory is, stolen is being used for scan out
    <halfline> and then when the monitor goes off other things try to use it too
    <halfline> when it comes back on, it then scans out the junk
    <anholt> I would object to merging it to 10
    <stereotype441> I agree.  The informal rule is "only bug fixes get cherry-picked to stable", and adhering to that rule is one of our best tools for ensuring that stable really is stable.
    <jbarnes> bwidawsk, halfline: maybe PSR?
    <halfline> jbarnes: want me to try booting with i915.powersave=0 ?
    <jbarnes> halfline: maybe ping vivijim, he's our PSR expert
    <jbarnes> if your panel has PSR that could definitely be the issue
    <jbarnes> not sure if it's tested with UXA
    <halfline> jbarnes: what's the best way to test that?
    <halfline> psr = panel self refresh ?
    <jbarnes> halfline: right
    <jbarnes> lemme see
    <bwidawsk> just curious, why did we implicate psr?
    <jbarnes> halfline: i915_enable_psr
    <jbarnes> bwidawsk, halfline: just guessing
    <jbarnes> sounds like fbc or psr corruption
    --- ryanlerch is now known as halfryan
    <jbarnes> bwidawsk: not sure about the stolen theory, what's your thinking there?
    <halfryan> jbarnes: i don't see i915_enable_psr in modinfo output for i915
    <bwidawsk> jbarnes: it's just it's offset 0 which is corrupted
    <bwidawsk> and it looks like not pixel data
    <bwidawsk> the kernel is too old to tell me which one is currently being scanned out
    <bwidawsk> but I'd guess it's the one at stolen 0
    <halfryan> jbarnes, did the more granular option get added later, and powersave will do the same (+ more) ?
    <jbarnes> halfryan: yeah maybe it's too new
    <halfryan> okay will try rebooting with i915.powersave=0 then
    <jbarnes> bwidawsk: nothing should mess with stolen just with dpms off though
    <jbarnes> I mean, BIOS-wise anyway
    <jbarnes> there could definitely be some stray write hitting there
    <jbarnes> that would make sense
    <ryanlerch> jbarnes, so i915.powersave=0 does not fix it
    <ryanlerch> but maybe it's not rolled into that option?
    <jbarnes> ryanlerch: well if you don't have the psr option I don't think psr is to blame
    <ryanlerch> okay
    <jbarnes> so we're back to thinking it's a clobber of some kind
    <ryanlerch> i do soee i915_sr_status says "self-refresh: enabled"
    <jbarnes> though really I'd just run sna :)
    <ryanlerch> not sure if that's related
    <jbarnes> ickle should probably just remove UXA by now
    <ryanlerch> i think ajax tried to switch fedora to SNA by default and hit some issues and had to change it back
    <jbarnes> lost to the mists of time though, at least for me
    <ajax> i got complained at, anyway.
    --- ryanlerch is now known as halfryan
    <halfryan> (woops this is halfline typing)
    <jbarnes> halfryan: have you filed a bug yet?
    <halfryan> have not
    <jbarnes> can you collect what you know so far with camera pics and file a new one?
    <halfryan> certainly
Comment 9 Chris Wilson 2013-11-13 21:43:37 UTC
Does this version of gnome-shell do any partial screen updates? If so, are those updates visible? If they are not, then this would correspond with a failed pageflip or modeset upon dpms restore that is not being corrected. I take it there are no warnings in Xorg.0.log?
Comment 10 Ray Strode [halfline] 2013-11-13 21:48:53 UTC
it is a version that does partial screen updates, and those updates aren't visible.  you can scroll the terminal around and it doesn't update, but hitting the super key makes it go to the overview and fixes the problem.

The only warnings in the Xorg log deal with the Cintiq plugged into it, nothing from the ddx. 

note it happens on two haswell machines (attachment 89166 [details] is a laptop and attachment 89167 [details] is a workstation).  All the bits except for attachment 89166 [details] are from the workstation.
Comment 11 Daniel Vetter 2013-11-13 21:51:27 UTC
Any special i915.ko module options? What happens when we disable fbc?

# grep .* /sys/modole/i915/parameters/*
Comment 12 Ray Strode [halfline] 2013-11-13 21:53:12 UTC
<bwidawsk> halfline: which kernel were you running again?
<ryanlerch> bwidawsk, Linux woolloongabba 3.11.7-300.fc20.x86_64 #1 SMP Mon Nov 4 15:07:39 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Comment 13 Ray Strode [halfline] 2013-11-13 21:54:11 UTC
the only module parameters we've tried are i915_enable_fbc=0 and powersave=0

neither prevents the bug from happening
Comment 14 Ray Strode [halfline] 2013-11-13 21:59:30 UTC
# grep '.*' /sys/module/i915/parameters/*
/sys/module/i915/parameters/disable_power_well:1
/sys/module/i915/parameters/enable_hangcheck:Y
/sys/module/i915/parameters/enable_ips:1
/sys/module/i915/parameters/fbpercrtc:0
/sys/module/i915/parameters/i915_enable_fbc:-1
/sys/module/i915/parameters/i915_enable_ppgtt:-1
/sys/module/i915/parameters/i915_enable_rc6:-1
/sys/module/i915/parameters/invert_brightness:0
/sys/module/i915/parameters/lvds_channel_mode:0
/sys/module/i915/parameters/lvds_downclock:0
/sys/module/i915/parameters/lvds_use_ssc:-1
/sys/module/i915/parameters/modeset:-1
/sys/module/i915/parameters/panel_ignore_lid:1
/sys/module/i915/parameters/powersave:0
/sys/module/i915/parameters/preliminary_hw_support:0
/sys/module/i915/parameters/reset:Y
/sys/module/i915/parameters/semaphores:-1
/sys/module/i915/parameters/vbt_sdvo_panel_type:-1
Comment 15 ryanlerch 2013-11-13 22:18:57 UTC
Created attachment 89170 [details]
Xorg.0.log
Comment 16 Ray Strode [halfline] 2013-11-13 22:33:54 UTC
2<ickle2> might as well try Option "TripleBuffer" "false"
6<halfline6> that was what i attached to the bug
6<halfline6> k
20* halfline 30walks over
<10-11- ryanlerch has quit 14(Remote host closed the connection14)
6<halfline6> disabling triple buffering doesn't fix it
6<halfline6> this time we saw a mostly black screen
6<halfline6> with only the terminal window visible
6<halfline6> showing old contents
-10-11> ryanlerch 14(10ryanlerch@nat/redhat/x-ybmfkipdodgeyxyh14) has joined #intel-gfx
6<halfline6> and the app menu in the upper left corner of the screen was blue
6<halfline6> couldn't get a photo of it in time
6<halfline6> before the screen refreshed on its own (not sure why it refreshed on its own)
<10-11- ryanlerch has quit 14(Remote host closed the connection14)
Comment 17 ryanlerch 2013-11-14 14:40:17 UTC
Created attachment 89196 [details]
another (different) form of corruption
Comment 18 ryanlerch 2013-11-14 14:43:42 UTC
Just got it to happen again.

My steps were:
1. Using the gnome display, turned all but 1 monitor off.
2. logged out of gnome, then back in again.
3. ran "sleep 1; xset dpms force off; sleep 1;"
4. monitor went blank, waited 4 seconds for the monitors to turn off completely.
5. moved the mouse, and the screen as shown in https://bugs.freedesktop.org/attachment.cgi?id=89196 was on the monitor.
Comment 19 Chris Wilson 2014-01-17 19:00:13 UTC
Let's assume this is related to bug 71908, so does 
http://patchwork.freedesktop.org/patch/17588/ help?
Comment 20 Jani Nikula 2014-04-10 13:20:07 UTC
Chris, want to take this bug? I have no clues about it.
Comment 21 Jani Nikula 2014-05-07 12:40:19 UTC
(In reply to comment #19)
> Let's assume this is related to bug 71908, so does 
> http://patchwork.freedesktop.org/patch/17588/ help?

The patch has been merged; please try current drm-intel-nightly branch from http://cgit.freedesktop.org/drm-intel.
Comment 22 Chris Wilson 2014-05-08 10:09:04 UTC
We ruled out VT'd as well?
Comment 23 Jani Nikula 2014-08-26 13:10:41 UTC
Ping for a retest on current drm-intel-nightly.
Comment 24 Jani Nikula 2014-09-05 11:33:23 UTC
Timeout, presumed fixed based on comment #21, please reopen if the problem persists.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.