System Environment: ----------------------------------------------------- Platform: SKL Regression: Not sure, this bug was introduced by fix another known issue: bug-89388, we verified below commit (exist in drm-intel-next-queued branch) was intent to fix bug-89388, but cause this issue. The console show this error message:” [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun” ==Kernel== -------------------------------------------------- commit c9f038a1a5924352ab8e510e4a45ac57b08db391 Author: Matt Roper <matthew.d.roper@intel.com> AuthorDate: Mon Mar 9 11:06:02 2015 -0700 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Tue Mar 17 22:30:17 2015 +0100 drm/i915: Don't assume primary & cursor are always on for wm calculation (v4) Current ILK-style watermark code assumes the primary plane and cursor plane are always enabled. This assumption, along with the combination of two independent commits that got merged at the same time, results in a NULL dereference. The offending commits are:Bug detailed description: -------------------------------------------------- Reproduce steps: ---------------------------- 1, Boot up system with edp display connected 2. xinit & 3. gnome-session 4. kill x 5. Repeat step 2-4, will encounter machine hang up issue.
Created attachment 114544 [details] dmesg info
This also happens when we change resolution in games (e.g. padman, etqw-demo, and many others) or exit/restart games, so this blocks most of game testing.
I can't reproduce this one by starting X and killing it, nor by doing init 3/init 5 cycles. tjaalton on IRC has similar issues though and with the same commit: <tjaalton> c9f038a1a592435 makes my skl-y hang for good on X restart I don't see anything wrong with this commit. Maybe a side effect elsewhere in the WM code, triggered by that commit.
(In reply to Damien Lespiau from comment #3) > I can't reproduce this one by starting X and killing it, nor by doing init > 3/init 5 cycles. xinit is not sufficient. Could you try gnome-session as well?
The first bad commit[1] is c9f038a1a5924352ab8e510e4a45ac57b08db391 Author: Matt Roper <matthew.d.roper@intel.com> Date: Mon Mar 9 11:06:02 2015 -0700 drm/i915: Don't assume primary & cursor are always on for wm calculation (v4) [1]There are 2 ways to reproduce the issue, "kill gnome-session twice" and "change game resolution". By "change game resolution", we could verify the fisrt bad commit.
Matt can you take a look?
This bug show the same bisected first bad commit as Bug 89731 - [SKL bisected regression] system hangs when restarting Xhttps://bugs.freedesktop.org/show_bug.cgi?id=89731
(In reply to Gordon Jin from comment #4) > (In reply to Damien Lespiau from comment #3) > > I can't reproduce this one by starting X and killing it, nor by doing init > > 3/init 5 cycles. > > xinit is not sufficient. Could you try gnome-session as well? To confirm, you can xinit and kill X as many times as you like without problem, but if you start a full Gnome session, kill it, start a second Gnome session, and then kill it again, you see a hang? I don't have access to a SKL platform, so I'm a bit blind here, but can you try a few things (assuming my understanding above is correct and also that you still see this issue on the latest -nightly)? - Test with xinit, but make sure you move your mouse into and out of the xterm that usually gets started by default to ensure the mouse cursor has to appear/change before you kill X. I want to rule out the framebuffer reference counting issues we had with universal cursors recently. - Start with xinit, but run xrandr to set varying display modes. I.e., can we easily trigger this crash by just switching modes with nothing else going on?
Also, what kind of displays are you using. There's a similar report in bug 89731 which indicates: > IIRC this only happens on the SKL-Y which has a builtin eDP panel, > but not on SKL-S hooked to my DP monitor. Are you also using eDP here? It's unclear to me whether my commit caused the problem here, or whether it just allowed us to get farther along and hit a different problem in existing code. We probably need someone with a SKL platform to do a little debugging to narrow down the area the crash is happening in.
(In reply to Matt Roper from comment #8) > (In reply to Gordon Jin from comment #4) > > (In reply to Damien Lespiau from comment #3) > > > I can't reproduce this one by starting X and killing it, nor by doing init > > > 3/init 5 cycles. > > > > xinit is not sufficient. Could you try gnome-session as well? > > To confirm, you can xinit and kill X as many times as you like without > problem, but if you start a full Gnome session, kill it, start a second > Gnome session, and then kill it again, you see a hang? > yes, the system hang.I can see a yellow light on the motherboard. > I don't have access to a SKL platform, so I'm a bit blind here, but can you > try a few things (assuming my understanding above is correct and also that > you still see this issue on the latest -nightly)? yes, this issue still on the latest -nightly. > - Test with xinit, but make sure you move your mouse into and out of the > xterm that usually gets started by default to ensure the mouse cursor has to > appear/change before you kill X. I want to rule out the framebuffer > reference counting issues we had with universal cursors recently. confirmed, I can move the mouse into and out of the xterm before kill X. > - Start with xinit, but run xrandr to set varying display modes. I.e., can > we easily trigger this crash by just switching modes with nothing else going > on? The screen will turn black when set varying display modes, then the system will hang when kill X.
(In reply to Matt Roper from comment #9) > Also, what kind of displays are you using. There's a similar report in bug > 89731 which indicates: > > > IIRC this only happens on the SKL-Y which has a builtin eDP panel, > > but not on SKL-S hooked to my DP monitor. > > Are you also using eDP here? Yes, I am using eDP here.
Can we just revert the skl part of this patch as an interim solution to unblock QA? Matt, can you please prepare a patch.
(In reply to Matt Roper from comment #8) > (In reply to Gordon Jin from comment #4) > > (In reply to Damien Lespiau from comment #3) > > > I can't reproduce this one by starting X and killing it, nor by doing init > > > 3/init 5 cycles. > > > > xinit is not sufficient. Could you try gnome-session as well? > > To confirm, you can xinit and kill X as many times as you like without > problem, Confirmed, I kill X more than 15 times, the system still without problem.
(In reply to Daniel Vetter from comment #12) > Can we just revert the skl part of this patch as an interim solution to > unblock QA? Matt, can you please prepare a patch. Reverting the SKL part of my patch will just result in NULL-dereference and immediate crashes when killing X or doing anything else that causes the primary plane to be disabled, so I think that will be even more crippling than the current situation (at least today it sounds like you can bring X down and back up at least once before running into problems).
(In reply to Matt Roper from comment #14) > (In reply to Daniel Vetter from comment #12) > > Can we just revert the skl part of this patch as an interim solution to > > unblock QA? Matt, can you please prepare a patch. > > Reverting the SKL part of my patch will just result in NULL-dereference and > immediate crashes when killing X or doing anything else that causes the > primary plane to be disabled, so I think that will be even more crippling > than the current situation (at least today it sounds like you can bring X > down and back up at least once before running into problems). Hm right. Can we do a functional revert instead like in the other wm code, i.e. assuming if state->fb == NULL that bpp == 4 and the primary plane spans the full screen? Horrible hacks I know, but that should at least get us out of this until someone can look at skl wm for real.
Tested on the latest nightly kernel(5ea91d) and latest mase(cc5860e4, this issue does not exists on skl.
Yes, current nightly works for me too, drm-intel-next-2015-04-10 doesn't. Is it the scaler stuff that fixed it?
Closing verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.