Bug 102929

Summary: Kernel 4.13.1 breaks screen output
Product: DRI Reporter: Axel braun <axel.braun>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: medium CC: 2bluesc, alexvillacislasso, creideiki+freedesktop-bugzilla, freedesktop_bugzilla, intel-gfx-bugs, jano.vesely, jefferym, serhiy.int
Version: unspecifiedKeywords: regression
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=98930
Whiteboard: ReadyForDev
i915 platform: I945GM i915 features: display/Other
Attachments:
Description Flags
dmesg output
none
Patch for 4.13 reverting bad commit
none
Patch (2) for 4.13 fixing compilation after reverting the bad commit none

Description Axel braun 2017-09-21 16:15:49 UTC
As recommended in https://bugzilla.suse.com/show_bug.cgi?id=1058870 , opening a bug for this component:

plane B assertion failure, should be off on pipe B but is still active
------------[ cut here ]------------
WARNING: CPU: 1 PID: 259 at ../drivers/gpu/drm/i915/intel_display.c:1252 assert_planes_disabled+0x128/0x140 [i915]
Modules linked in: ata_piix i915(+) serio_raw i2c_algo_bit uhci_hcd ehci_pci drm_kms_helper ehci_hcd syscopyarea sysfillrect sysimgblt usbcore fb_sys_fops drm video button sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua
CPU: 1 PID: 259 Comm: systemd-udevd Not tainted 4.13.1-1-pae #1
Hardware name: ASUSTeK Computer INC. 901/901, BIOS 1903    03/17/2009
task: f4ae4700 task.stack: f4b3c000
EIP: assert_planes_disabled+0x128/0x140 [i915]
EFLAGS: 00010286 CPU: 1
EAX: 00000046 EBX: f4bbc000 ECX: f71eb340 EDX: f71e486c
ESI: 00000001 EDI: 00000001 EBP: f4b3dbe4 ESP: f4b3dbc8
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 80050033 CR2: b7152000 CR3: 34aeca60 CR4: 000006f0
Call Trace:
 ? intel_disable_pipe+0x45/0x220 [i915]
 ? i9xx_crtc_disable+0x5c/0x3a0 [i915]
 ? drm_connector_list_iter_next+0x71/0xa0 [drm]
 ? drm_atomic_add_affected_connectors+0xd0/0xf0 [drm]
 ? intel_crtc_disable_noatomic+0xb5/0x290 [i915]
 ? intel_modeset_setup_hw_state+0xa58/0xc40 [i915]
 ? intel_modeset_setup_hw_state+0xa85/0xc40 [i915]
 ? intel_modeset_init+0x4d1/0xea0 [i915]
 ? intel_setup_gmbus+0x1e1/0x270 [i915]
 ? i915_driver_load+0xa4c/0x1330 [i915]
 ? local_pci_probe+0x35/0x80
 ? pci_device_probe+0x158/0x170
 ? driver_probe_device+0x2b9/0x410
 ? __driver_attach+0x99/0xe0
 ? driver_probe_device+0x410/0x410
 ? bus_for_each_dev+0x4f/0x80
 ? driver_attach+0x19/0x20
 ? driver_probe_device+0x410/0x410
 ? bus_add_driver+0x187/0x230
 ? 0xf8306000
 ? driver_register+0x56/0xd0
 ? 0xf8306000
 ? do_one_initcall+0x46/0x170
 ? do_init_module+0x21/0x1d9
 ? do_init_module+0x50/0x1d9
 ? load_module+0x1435/0x1980
 ? SyS_finit_module+0x79/0xc0
 ? do_fast_syscall_32+0x71/0x150
 ? entry_SYSENTER_32+0x4e/0x7c
Code: ff 83 c4 10 e9 44 ff ff ff 57 68 a0 91 29 f8 e8 cc 8c 2d e0 0f ff 58 5a e9 30 ff ff ff ff 75 f0 50 68 d8 91 29 f8 e8 b5 8c 2d e0 <0f> ff 83 c4 0c e9 3e ff ff ff 8d b4 26 00 00 00 00 8d bc 27 00
---[ end trace 0e7d08f54709149f ]---
Comment 1 Elizabeth 2017-09-25 22:45:23 UTC
Hello Axel, 
Could you please attach dmesg with drm.debug=0x1e log_bug_len=2M parameter on grub from boot till problem. Also is your bios updated?
Thanks.
Comment 2 Axel braun 2017-09-26 19:37:23 UTC
Created attachment 134496 [details]
dmesg output
Comment 3 Axel braun 2017-09-26 19:39:11 UTC
(In reply to Elizabeth from comment #1)
> Hello Axel, 
> Could you please attach dmesg with drm.debug=0x1e log_bug_len=2M parameter
> on grub from boot till problem. Also is your bios updated?

BIOS is the one before the last - changed in 2009. there is no update anymore for this old hardware
Comment 4 Felix W 2017-11-12 13:03:53 UTC
Can confirm this problem on a Samsung NC10 with Intel Atom (945GSE) graphics. Using newest Arch Linux 32 bit kernel 4.13.9, the left ~70% of the screen becomes a flickering mess after the modesetting driver is loaded. Kernel 4.12.8 works.
Comment 5 Elizabeth 2017-11-13 16:28:23 UTC
(In reply to Felix W from comment #4)
> Can confirm this problem on a Samsung NC10 with Intel Atom (945GSE)
> graphics. Using newest Arch Linux 32 bit kernel 4.13.9, the left ~70% of the
> screen becomes a flickering mess after the modesetting driver is loaded.
> Kernel 4.12.8 works.
Thanks for the update Felix, in that case, it could be a regression, if easy reproducible could you please try to bisect the issue??
Comment 6 Kai-Heng Feng 2017-11-23 05:07:27 UTC
(In reply to Elizabeth from comment #5)
> (In reply to Felix W from comment #4)
> > Can confirm this problem on a Samsung NC10 with Intel Atom (945GSE)
> > graphics. Using newest Arch Linux 32 bit kernel 4.13.9, the left ~70% of the
> > screen becomes a flickering mess after the modesetting driver is loaded.
> > Kernel 4.12.8 works.
> Thanks for the update Felix, in that case, it could be a regression, if easy
> reproducible could you please try to bisect the issue??

I've been doing bisection for the users [1], but it's blocked by another issue, which hang the system at initramfs.

Instead of bisecting the bisection, can you take a look directly?

[1] https://bugs.launchpad.net/linux/+bug/1724639
Comment 7 Felix W 2017-11-24 12:23:51 UTC
Took the time to bisect the issue on my NC10. Compiling the kernel on this machine takes nearly 3 hours even with an SSD, so sorry for the delay. ;-)

git bisect start
# bad: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
git bisect bad 569dbb88e80deb68974ef6fdd6a13edb9d686261
# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# good: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect good ac7b75966c9c86426b55fe1c50ae148aa4571075
# bad: [9c284c41c0886f09e75c323a16278b6d353b0b4a] mmc: tmio-mmc: fix bad pointer math
git bisect bad 9c284c41c0886f09e75c323a16278b6d353b0b4a
# good: [f263fbb8d60824993c1b64385056a3cfdbb21d45] Merge tag 'pci-v4.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
git bisect good f263fbb8d60824993c1b64385056a3cfdbb21d45
# bad: [04d4fb5fa63876d8e7cf67f2788aecfafc6a28a7] Merge branch 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect bad 04d4fb5fa63876d8e7cf67f2788aecfafc6a28a7
# bad: [562ff059bd5f8f04881256532c6d835af3db55bd] Merge tag 'omapdrm-4.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into drm-next
git bisect bad 562ff059bd5f8f04881256532c6d835af3db55bd
# bad: [d2f5e36df61f4f06b9a52605fba23b3c3608efca] drm/i915/skl+: Perform wm level calculations in separate function
git bisect bad d2f5e36df61f4f06b9a52605fba23b3c3608efca
# good: [29ef3fa987edb9768e19a6325030e1d2d58e29de] drm/i915: Unwrap top level fence-array
git bisect good 29ef3fa987edb9768e19a6325030e1d2d58e29de
# bad: [659056f257e01fbc81e7d0887af7551d4f145130] drm/i915: Split cursor check_plane into i845 and i9xx variants
git bisect bad 659056f257e01fbc81e7d0887af7551d4f145130
# good: [9bacd4b1f8553428c5723e4c8f2ca491b400e429] drm/i915/dp: Check error return during DPCD capability queries
git bisect good 9bacd4b1f8553428c5723e4c8f2ca491b400e429
# good: [0f95ff8505b25aa29530818afcf12ff08399ccf5] drm/i915: Refactor the g4x TLB miss w/a to a helper
git bisect good 0f95ff8505b25aa29530818afcf12ff08399ccf5
# good: [d509e28b70e45ea0699128764d05893bcf347007] drm/i915: Parametrize cursor/primary pipe select bits
git bisect good d509e28b70e45ea0699128764d05893bcf347007
# bad: [cd5dcbf1b26c60dfa8fd8628fd2fcf3d33877c63] drm/i915: Clean up cursor junk from intel_crtc
git bisect bad cd5dcbf1b26c60dfa8fd8628fd2fcf3d33877c63
# bad: [1cecc830e6b662a765d60860112cf69182b56b8e] drm/i915: Refactor CURBASE calculation
git bisect bad 1cecc830e6b662a765d60860112cf69182b56b8e
# bad: [282dbf9b017bc6d5fdaeadf14e534c2fe22fee2d] drm/i915: Pass intel_plane and intel_crtc to plane hooks
git bisect bad 282dbf9b017bc6d5fdaeadf14e534c2fe22fee2d
# first bad commit: [282dbf9b017bc6d5fdaeadf14e534c2fe22fee2d] drm/i915: Pass intel_plane and intel_crtc to plane hooks


I reverted the bad commit on top of v4.13 and added some possibly dirty workarounds to make it compile. Will attach both commits as patches.

Unfortunately, I had no time to look into it further but maybe you'll identify the culprit quickly. It may be relevant that the cursor is still rendered fine over the whole screen area even if the regression is present in the kernel.
Comment 8 Felix W 2017-11-24 12:27:02 UTC
Created attachment 135694 [details] [review]
Patch for 4.13 reverting bad commit
Comment 9 Felix W 2017-11-24 12:30:59 UTC
Created attachment 135695 [details] [review]
Patch (2) for 4.13 fixing compilation after reverting the bad commit
Comment 10 Ville Syrjala 2017-12-07 18:00:35 UTC
*** Bug 102931 has been marked as a duplicate of this bug. ***
Comment 11 Ville Syrjala 2017-12-07 18:06:34 UTC
(In reply to Felix W from comment #7)
> # first bad commit: [282dbf9b017bc6d5fdaeadf14e534c2fe22fee2d] drm/i915:
> Pass intel_plane and intel_crtc to plane hooks

Hmm. I failed to realize this was such a recent regression, and so widespread. Thanks for the bisect.

Can someone please verify that these backports work:
git://github.com/vsyrjala/linux.git plane_assert_v4.13
git://github.com/vsyrjala/linux.git plane_assert_v4.14

If everything looks good I'll try to get the fixes into the next 4.15-rc so that we can get the backports flowing into the stable kernels ASAP.
Comment 12 Alex Villacís Lasso 2017-12-09 22:58:01 UTC
(In reply to Ville Syrjala from comment #11)
> (In reply to Felix W from comment #7)
> > # first bad commit: [282dbf9b017bc6d5fdaeadf14e534c2fe22fee2d] drm/i915:
> > Pass intel_plane and intel_crtc to plane hooks
> 
> Hmm. I failed to realize this was such a recent regression, and so
> widespread. Thanks for the bisect.
> 
> Can someone please verify that these backports work:
> git://github.com/vsyrjala/linux.git plane_assert_v4.13
> git://github.com/vsyrjala/linux.git plane_assert_v4.14
> 
> If everything looks good I'll try to get the fixes into the next 4.15-rc so
> that we can get the backports flowing into the stable kernels ASAP.

Alex here, plane_assert_v4.14 works correctly for me on my Acer Aspire One that exhibited the bug with unpatched 4.13.x .
Comment 13 Felix W 2017-12-10 10:06:55 UTC
Just finished testing plane_assert_v4.13 and plane_assert_v4.14 on my Samsung NC10 - they both work without any graphical issues. Thank you!
Comment 14 Simon Hug 2017-12-14 23:21:39 UTC
I also have a Intel Atom N270 (with 945GSE) device that is affected by this regression. The console is shifted down and offscreen by two lines and X would just produce some colorful pixels at the top.

plane_assert_v4.14 fixed this for me as well. Thank you.
Comment 15 sergio.callegari 2017-12-25 13:45:49 UTC
Just a quick inquire. Is this fix already in some 4.15rc?
Comment 16 Alex Villacís Lasso 2017-12-28 15:47:21 UTC
(In reply to sergio.callegari from comment #15)
> Just a quick inquire. Is this fix already in some 4.15rc?

As far as I can see, no (up to v4.15-rc5).
Comment 17 tanya_lug 2018-02-21 10:22:05 UTC
I'm on a triple-boot Asus eee-pc 1000H (2 Mint, 1 Manjaro (intel-ucode removed)), hope my findings can help to fix this:

This is not a kernel issue by itself but also grub issue, triggered by the ‘fixed’ kernels of the 4.13/4.14 series, both in Mint and Manjaro. In a nutshell: If you want to boot a ‘fixed’ kernel of the 4.13 or 4.14 series via the 2 top lines of your grub menu it will glitch, both on Mint and on Manjaro, but booting the ‘fixed’ kernels further down the grub menu works fine.

If you then sudo grub-install /dev/sda from an OS further down the menu, the error shifts to the new OS in control of grub, allowing the others to boot without problem, including the one that glitched when in control of grub.

Other peculiarities of this glitch:
- clicks on the black bit are ‘getting through’ to windows below, mouse-overs of windows below are visible on cursor icon status changes, you can ‘feel’ your way to the title bar, drag it into the 20% and use it
- if you remote desktop into your glitched machine there is no glitch

Also, you can fix the glitch from within the glitched system, here 2 ways:
I use ‘newrez’ script to change my eee-pc resolution on the fly,  (https://www.linux-apps.com/content/show … nt=1346861) Place newrez.sh where you can see it, the 20% strip, run it (e.g reloading default) and the glitch is gone (Run ‘newrez default' at startup and you can use the machine normally without GRUB_GFXPAYLOAD_LINUX="text"). Similarly, I can fix it on my eee-pc by running:
xrandr --newmode "1024x600_60.00"   49.00  1024 1072 1168 1312  600 603 613 624 -hsync +vsync
xrandr --addmode VGA1 1024x600_60.00
xrandr --output VGA1 --mode 1024x600_60.00
Comment 18 Jani Saarinen 2018-03-29 07:11:35 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 19 Alex Villacís Lasso 2018-03-29 15:57:00 UTC
(In reply to Jani Saarinen from comment #18)
> First of all. Sorry about spam.
> This is mass update for our bugs. 
> 
> Sorry if you feel this annoying but with this trying to understand if bug
> still valid or not.
> If bug investigation still in progress, please ignore this and I apologize!
> 
> If you think this is not anymore valid, please comment to the bug that can
> be closed.
> If you haven't tested with our latest pre-upstream tree(drm-tip), can you do
> that also to see if issue is valid there still and if you cannot see issue
> there, please comment to the bug.

Currently running 4.15.10 in Fedora 27 on the Acer Aspire One. The bug has been fixed since 4.15.3 .
Comment 20 Jani Saarinen 2018-03-30 06:49:56 UTC
OK, thanks for the feedback, resolving.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.