74022 – [pnv !DRM_I915_FBDEV] Hang upon resume

Bug 74022 - [pnv !DRM_I915_FBDEV] Hang upon resume

Summary: [pnv !DRM_I915_FBDEV] Hang upon resume

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	medium critical
Assignee:	Daniel Vetter
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-01-24 16:22 UTC by Chris Wilson
Modified:	2017-02-21 16:54 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	PNV
i915 features:	GEM/Other

Attachments
/sys/class/drm/card0/error shortened (28.58 KB, text/plain) 2015-05-30 12:19 UTC, Constantin Zankl	no flags	Details
View All

Description Chris Wilson 2014-01-24 16:22:07 UTC

Between 3.12 and drm-intel-nightly, we loose coherency after resume.

ring->get_seqno() = 0xffffeffe, but hws[0x20] = 0xfffff001

so the gpu is advancing and writing to the hws page, but the seqno is not visible by the CPU - the cache snooping is bust. For SNB+, we had a similar issue which required flushing the ring TLBs.

Bisect in progress.

Comment 1 Chris Wilson 2014-01-29 14:30:35 UTC

Bisect says bug does not exist. Grr :<

Next guess it is a config option that changed recently.

Comment 2 Chris Wilson 2014-01-30 11:48:44 UTC

Found it, CONFIG_DRM_I915_FBDEV.

Comment 3 Ville Syrjala 2014-02-19 14:57:40 UTC

(In reply to comment #2)
> Found it, CONFIG_DRM_I915_FBDEV.

Weird stuff. Daniel introduced that sucker, so I'm throwing the bug in his direction.

Comment 4 Daniel Vetter 2014-03-03 14:22:00 UTC

On a hunch I expect the bios to stomp on the lower end of the gtt/stolen. Can you please test what happens
- with Jesse's fb takeover patches (pls double-check that we indeed wrap the fb correctly)
- and when disabling stolen if the fb wrap patches don't help.

Finally is the bios somehow resuming the display hw for us already or are all pipes off on takeover from resume?

Comment 5 Chris Wilson 2014-08-25 12:46:01 UTC

(In reply to comment #4)
> On a hunch I expect the bios to stomp on the lower end of the gtt/stolen.
> Can you please test what happens
> - with Jesse's fb takeover patches (pls double-check that we indeed wrap the
> fb correctly)

They already were included.

> - and when disabling stolen if the fb wrap patches don't help.

There wasn't stolen available at the introduction of the bug.
 
> Finally is the bios somehow resuming the display hw for us already or are
> all pipes off on takeover from resume?

Nope.

Back to testing and this is gone on my latest ring init routines. I might get round to seeing if it was fixed in the meantime...

Comment 6 Rodrigo Vivi 2014-10-15 19:11:23 UTC

Chris, can we close this?

Comment 7 Jani Nikula 2015-01-29 13:44:37 UTC

(In reply to Rodrigo Vivi from comment #6)
> Chris, can we close this?

Comment 8 Chris Wilson 2015-01-31 11:37:59 UTC

Very hard to say since the machine now lockups up entirely with !FBDEV...

Comment 9 Constantin Zankl 2015-05-30 12:19:10 UTC

Created attachment 116169 [details]
/sys/class/drm/card0/error shortened

Comment 10 Constantin Zankl 2015-05-30 12:24:31 UTC

I have a similar problem here... GPU hangs after resume.

[ 1710.007256] [drm] stuck on render ring
[ 1710.008312] [drm] GPU HANG: ecode 0:0x772a3d58, in Xorg [1560], reason: Ring hung, action: reset
[ 1710.008316] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1710.008317] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1710.008320] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1710.008322] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 1710.008323] [drm] GPU crash dump saved to /sys/class/drm/card0/error


This is happening on a Mint Linux with cinnamon. I'm not sure if this is still in the current version a problem but it accrues in 2:2.99.910-0ubuntu1.6 of xserver-xorg-video-intel.

Constantin

Comment 11 Chris Wilson 2015-05-30 13:09:35 UTC

(In reply to Constantin Zankl from comment #10)
> I have a similar problem here... GPU hangs after resume.

Similar but very unlikely to have FBDEV disabled, so please please file a new bug with an error state.

Comment 12 Chris Wilson 2015-08-25 16:08:51 UTC

Ah. New lead. init_bios() doesn't take a pin for the info->screen.base mapping and so on resume the fbcon is no longer present and the memset(info->screen.base, 0, info->screen.size) tramples over random buffers.  Oops, oops, oops.

Comment 13 Chris Wilson 2015-08-25 16:10:22 UTC

Hmm, or sadly that is not this bug, just another one.

Comment 14 Ileana 2016-04-19 10:45:25 UTC

Is this still an issue? There have been no updates for about 8 months.

Comment 15 Chris Wilson 2016-04-19 10:48:42 UTC

I actually suspect I've finally fixed it with "avoid ringbuffers at offset 0". Just don't regularly do non-default testing on my pnv.

Comment 16 Ileana 2016-04-19 10:50:29 UTC

(In reply to Chris Wilson from comment #15)
> I actually suspect I've finally fixed it with "avoid ringbuffers at offset
> 0". Just don't regularly do non-default testing on my pnv.

Can you please confirm so we can change the status? Thanks!

Comment 17 Ricardo 2017-02-21 16:37:20 UTC

Chris I will close this bug and if the problem still exist open a new bug

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.