Bug 101049 - [IVB] [bisected] Freeze when booting with kernel 4.9 and later
Summary: [IVB] [bisected] Freeze when booting with kernel 4.9 and later
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords: bisected
Depends on:
Blocks:
 
Reported: 2017-05-15 16:05 UTC by hisele
Modified: 2018-04-23 09:57 UTC (History)
3 users (show)

See Also:
i915 platform: IVB
i915 features: display/Other


Attachments

Description hisele 2017-05-15 16:05:30 UTC
With kernels after 4.9 my system completely freezes after some seconds while starting X. With gdm it happens after after clicking on the user to login, but in most cases only the mouse is drawn.
I'm using a Radeon R9 390 together with the HD Graphics 4000 in the i7 3700k and found this bug when trying to use the integraded graphics. The bug appears only on kernel 4.9 and later and the integrated graphics has to be choosen as primary graphics in the BIOS. I'm surprised nobody reported this bug so far since it's quite a long time in the stable kernel now, maybe it only occurs with this specific setup, I didn't try it without the AMD card so far.

The bug is reproducible with the drm-tip branch and after bisecting I found commit 29ecd78d3b79746fc837b820accb062f6433d5fb to be the problem, with this one reverted the problem was gone.

I also tried to get some kernel logs, but this bug completely freezes the system, not even the reset button works anymore and after rebooting there were no signs of the crash.

Here's some information about my system:

Motherboard: Biostar H77MU3

lspci:
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation H77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290/390] (rev 80)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X]
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
Comment 1 Chris Wilson 2017-05-15 16:14:08 UTC
(In reply to hisele from comment #0) 
> The bug is reproducible with the drm-tip branch and after bisecting I found
> commit 29ecd78d3b79746fc837b820accb062f6433d5fb to be the problem, with this
> one reverted the problem was gone.

Dubious as there is no direct functional change in that patch.
Comment 2 Chris Wilson 2017-05-15 16:18:03 UTC
Correcting myself, there was one which was to act upon the result of:

        if (IS_GEN6(dev_priv) ||
            IS_IVYBRIDGE(dev_priv) || IS_HASWELL(dev_priv)) {
                u32 params = 0;

                sandybridge_pcode_read(dev_priv, GEN6_READ_OC_PARAMS, &params);
                if (params & BIT(31)) { /* OC supported */
                        DRM_DEBUG_DRIVER("Overclocking supported, max: %dMHz, overclock: %dMHz\n",
                                         (dev_priv->rps.max_freq & 0xff) * 50,
                                         (params & 0xff) * 50);
                        dev_priv->rps.max_freq = params & 0xff;
                }
        }

So to test that:

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 417fd72f6968..a5a5b90d7044 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7530,8 +7530,7 @@ void intel_init_gt_powersave(struct drm_i915_private *dev_priv)
                              intel_freq_opcode(dev_priv, 450));
 
        /* After setting max-softlimit, find the overclock max freq */
-       if (IS_GEN6(dev_priv) ||
-           IS_IVYBRIDGE(dev_priv) || IS_HASWELL(dev_priv)) {
+       if (0) {
                u32 params = 0;
 
                sandybridge_pcode_read(dev_priv, GEN6_READ_OC_PARAMS, &params);
Comment 3 hisele 2017-05-15 17:38:03 UTC
Yes, this change fixes the freezes.
Comment 4 Elizabeth 2017-06-20 17:23:22 UTC
(In reply to hisele from comment #3)
> Yes, this change fixes the freezes.

Closing the bug since it seems fixed. If there is any change on this case, please mark as REOPEN and share the information. Thanks.
Comment 5 hisele 2017-06-22 21:23:34 UTC
I don't think it has been fixed. The code causing the crash is still there: https://cgit.freedesktop.org/drm-tip/tree/drivers/gpu/drm/i915/intel_pm.c#n7731
Comment 6 hisele 2017-07-17 11:03:36 UTC
I just tried out the drm-tip branch and the bug still hasn't been fixed.
With the changes you recommended in comment 2 it works fine, so it seems like this is not hard to fix.
Comment 7 Elizabeth 2017-07-20 19:39:14 UTC
(In reply to hisele from comment #5)
> I don't think it has been fixed. The code causing the crash is still there:
> https://cgit.freedesktop.org/drm-tip/tree/drivers/gpu/drm/i915/intel_pm.
> c#n7731

Truth, thanks for the correction.
Comment 8 hisele 2017-08-08 20:47:02 UTC
Anything new? I'd really like to run my distributions kernel again at some point.
Comment 9 hisele 2017-08-28 07:45:59 UTC
Still nothing? Why can't this be fixed?
Comment 10 Jani Nikula 2017-08-29 13:34:02 UTC
Chris?
Comment 11 Chris Wilson 2017-08-29 13:39:33 UTC
The value provided by the bios is unusable on that machine, check for a bios update; otherwise we can either quirk the platform, the manufacturer or the bios.
Comment 12 hisele 2017-08-29 13:53:50 UTC
Thank you, I'm using the latest bios, motherboard is a Biostar H77MU3.
Comment 13 balorg2000 2017-10-20 16:20:12 UTC
I have the same problem with an i5 3570k and MSI Z77A-G43 mainboard.
Comment 14 balorg2000 2018-02-20 20:49:25 UTC
Can this bug be fixed? Will it be fixed? Are there any other workarounds than building the kernel with the patch disabled?
Comment 15 Chris Wilson 2018-02-20 20:56:29 UTC
Have you checked for a bios update? The essence of the problem is that the bios provides a value that seems to cause a lockup. To prevent us using that value, we can do "cat /sys/class/drm/card0/gt_max_freq_mhz > /sys/class/drm/card0/gt_boost_freq_mhz".
Comment 16 balorg2000 2018-02-21 17:06:05 UTC
Thanks for your help. Im using the latest bios from 2014-05. I tested your solution and it works. "/sys/class/drm/card0/gt_boost_freq_mhz" was at 1500.
I wrote this systemd service to set it on boot. Is there a better way to do it?

[Unit]
Description=fix iGPU boost clock

[Service]
Type=oneshot
ExecStart=/bin/sh -c '/usr/bin/cat /sys/class/drm/card0/gt_max_freq_mhz > /sys/class/drm/card0/gt_boost_freq_mhz'

[Install]
WantedBy=multi-user.target
Comment 17 Jani Saarinen 2018-03-29 07:09:57 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 18 Jani Saarinen 2018-04-23 09:57:39 UTC
Closing, please re-open if still occurs.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.