Bug 94676

Summary: Possible kernel regression for gen3 and earlier in i915.ko
Product: DRI Reporter: Vivek Dasmohapatra <vivek>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: I945GM i915 features:
Attachments:
Description Flags
dmesg output from a 4.3.5 (4.3 bpo kernel from debian backports) booting on a gen3 based laptop none

Description Vivek Dasmohapatra 2016-03-23 20:34:26 UTC
Created attachment 122505 [details]
dmesg output from a 4.3.5 (4.3 bpo kernel from debian backports) booting on a gen3 based laptop

When booting an old laptop with a gen3 (I think) GPU on a 4.3.5 kernel:

Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03)

There were lots of warnings in dmesg and the graphics output was corrupted
as described here: 

https://01.org/linuxgraphics/forum/graphics-installer-discussions/intel-graphics-1.4.0-32bit-ubuntu-15.10-login-screen-corruption

The original test was with the 01.org backported driver but the user
was kind enough to provide me with access to the machine in question
so I was able to try out a 4.3.5 kernel on it, same behaviour.

So far there appear to be two problems - the first is in intel_sanitize_crtc:

	/* We need to sanitize the plane -> pipe mapping first because this will
	 * disable the crtc (and hence change the state) if it is wrong. Note
	 * that gen4+ has a fixed plane -> pipe mapping.  */
	if (INTEL_INFO(dev)->gen < 4 && !intel_check_plane_mapping(crtc)) {

This section used to call intel_crtc_disable_planes() in 4.2.x and do some other
cleanup besides, but now calls intel_crtc_disable_noatomic() which in turn
uses the recorded plane_mask - but this has not yet been set, so nothing
gets disabled, then later when intel_disable_pipe() is called assert_planes_disabled() blows a gasket:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 81 at /build/linux-5EEdAm/linux-4.3.5/drivers/gpu/drm/i915/intel_display.c:1389 assert_planes_disabled+0x108/0x140 [i915]()
plane B assertion failure, should be off on pipe B but is still active
Modules linked in: psmouse ata_piix(+) libata scsi_mod sdhci_pci tg3 sdhci ptp pps_core mmc_core libphy firewire_ohci firewire_core crc_itu_t ehci_pci(+) fan thermal wmi i915(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops uhci_hcd video ehci_hcd button drm usbcore usb_common
CPU: 1 PID: 81 Comm: systemd-udevd Not tainted 4.3.0-0.bpo.1-686 #1 Debian 4.3.5-1~bpo8+1
Hardware name: Hewlett-Packard HP Compaq nc6320 (RH374ET#ABU)/30AA, BIOS 68YDU Ver. F.0D 04/17/2007
 00000000 49d9f2cb f5cb1a24 c12999b5 f5cb1a34 c10578e1 f8397e04 f5cb1a54
 00000051 f8397b3c 0000056d f8327438 00000009 0000056d f8327438 f5ec0000
 00000042 00000001 f5cb1a40 c105794e 00000009 f5cb1a34 f8397e04 f5cb1a54
Call Trace:
 [<c12999b5>] ? dump_stack+0x3e/0x59
 [<c10578e1>] ? warn_slowpath_common+0x91/0xc0
 [<f8327438>] ? assert_planes_disabled+0x108/0x140 [i915]
 [<f8327438>] ? assert_planes_disabled+0x108/0x140 [i915]
 [<c105794e>] ? warn_slowpath_fmt+0x3e/0x60
 [<f8327438>] ? assert_planes_disabled+0x108/0x140 [i915]
 [<f8331424>] ? intel_disable_pipe+0x44/0x290 [i915]
 [<f832a8f3>] ? assert_vblank_disabled+0x13/0x70 [i915]
 [<f8334ad8>] ? i9xx_crtc_disable+0x68/0x3d0 [i915]

having just looked at the 4.4 branch in stable, I think this may have
been fixed by 634b3a4a476e96816d5d6cd5bb9f8900a53f56ba.

--------------------------------------------------------------------------

Next up is drm_atomic_commit(), called from intel_get_load_detect_pipe()
and a bunch of other places: This one seems to fail because one of the CRTCs
doesn't have a mode blob (and isn't enabled, I think), but drm_atomic_check_only tries to check its state anyway.

I think this is because intel_modeset_readout_hw_state() skips calling 
drm_atomic_set_mode_for_crtc() on it as its crtc->base.state->active
is false, so no mode_blob is set. I guess drm_atomic_check_only should
likewise know enough to leave it alone? I'll confess to be being a bit
lost here.

I still have [remote] access to the machine in question if
anything needs testing.
Comment 1 Rodrigo Vivi 2016-03-29 22:07:03 UTC
Hi Vivek, 

there in the forum you told: "Great - I think I've found the commit that introduced the problem"...

So it seems that your bisect found something... Although this atomic modeset changed a lot...

Anyway, what did your bisect tell you? What was the first bad commit?
Comment 2 Vivek Dasmohapatra 2016-03-30 00:27:00 UTC
(In reply to Rodrigo Vivi from comment #1)

I think I thought it was b70709a6f0e9176c2bc7ecf44acf015c7362ddc6 but too much 
had changed for me to see if that was actually the case: I looked at the
more recent commits and it looks like 634b3a4a476e96816d5d6cd5bb9f8900a53f56ba
fixes at least the first part of the problem (plan_mask not being set yet when …sanitize_crtc is called).
Comment 3 Vivek Dasmohapatra 2016-03-30 21:04:49 UTC
Ok, confirmed that 634b3a4a476e96816d5d6cd5bb9f8900a53f56ba 
fixes the initial sanitize_crtc problem.

The problem with drm_atomic_commit first called from 
intel_get_load_detect_pipe() but called from other places 
too ending up in the no mode_blob error path is stil there.
Comment 4 Jani Nikula 2016-09-20 12:48:44 UTC
Presumed fixed then, please reopen if the problem persists with latest kernels.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.