Bug 57495 - [945gm regression] GPU hangs on login to Unity
Summary: [945gm regression] GPU hangs on login to Unity
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-24 21:10 UTC by Kees Bakker
Modified: 2017-07-24 22:59 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg running 3.2.0-29 (ubuntu kernel) (71.14 KB, text/plain)
2012-11-24 21:45 UTC, Kees Bakker
no flags Details
dmesg running 3.2.0-29 (ubuntu kernel), retry (123.23 KB, text/plain)
2012-11-25 13:57 UTC, Kees Bakker
no flags Details
dmesg running 3.5.0-19.30 (ubuntu kernel) (119.32 KB, text/plain)
2012-11-27 21:15 UTC, Kees Bakker
no flags Details
/var/log/kern.log with the oops when starting Unity on 3.5.0 (75.83 KB, text/plain)
2012-11-27 21:29 UTC, Kees Bakker
no flags Details
i915_error_state after the oops (673.44 KB, text/plain)
2012-11-30 19:40 UTC, Kees Bakker
no flags Details
kern.log with 3.7.0-rc8 gives GPU hung (394.52 KB, text/plain)
2012-12-06 21:36 UTC, Kees Bakker
no flags Details
i915_error_state after "GPU hung" (673.56 KB, text/plain)
2012-12-06 21:37 UTC, Kees Bakker
no flags Details
again disable all outpus on driver load (1.35 KB, patch)
2012-12-06 21:59 UTC, Daniel Vetter
no flags Details | Splinter Review
rebased patch on top of 3.7/drm-intel-fixes (1.38 KB, patch)
2012-12-11 20:33 UTC, Daniel Vetter
no flags Details | Splinter Review
kern.log with WARNING at intel_display.c:3588 (159.87 KB, text/plain)
2012-12-13 22:36 UTC, Kees Bakker
no flags Details

Description Kees Bakker 2012-11-24 21:10:54 UTC
After updating Ubuntu on my Macbook2,1 (that's the 13" white macbook) I am not able to get a working display anymore. But when I boot the old 3.2.0 kernel it is (more or less) OK. (I say more or less, because 3D Unity isn't working. I had to switch to XFCE.)

In the update process I have installed development mesa packages too, but that didn't help either.

With 3.3.x kernel it continues to boot until it logs in, but then both keyboard and network are dead.

With 3.5.x kernel I simply get a black screen, and no keyboard, and no network. I think it gets stuck very early in the boot process.

So, one of the changes from Linux 3.2.0 to 3.3.x are not working.

I'd be most grateful if someone could give a hint what to do next to figure out what is wrong. Here is the lspci output for device. Let me know what else you need.


root@makkie:~# lspci -v -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA controller])
	Subsystem: Intel Corporation Device 7270
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at 90380000 (32-bit, non-prefetchable) [size=512K]
	I/O ports at 20e0 [size=8]
	Memory at 80000000 (32-bit, prefetchable) [size=256M]
	Memory at 90400000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
	Capabilities: [d0] Power Management version 2
	Kernel driver in use: i915
	Kernel modules: intelfb, i915
Comment 1 Chris Wilson 2012-11-24 21:17:54 UTC
You need to attach a drm.debug=6 dmesg (that is append drm.debug=6 to your kernel command line and attach the output of dmesg) from 3.2, 3.3 and 3.5 (if possible). And please do see if you can run with a current kernel, say from the Ubuntu kernel-ppa/drm-intel-experimental.
Comment 2 Kees Bakker 2012-11-24 21:45:30 UTC
Created attachment 70527 [details]
dmesg running 3.2.0-29 (ubuntu kernel)

This is the dmesg from using kernel 3.2.0-29
Comment 3 Kees Bakker 2012-11-24 21:49:34 UTC
BTW. I've tried one of the newest Ubuntu kernel too (3.7.0-030700rc6) but it didn't help. In fact with 3.3.8 I did get something on the display (but no keyboard, no network) and with 3.7.0 I just had a blank screen, and no keyboard, no network.
Comment 4 Daniel Vetter 2012-11-25 09:08:55 UTC
(In reply to comment #2)
> Created attachment 70527 [details]
> dmesg running 3.2.0-29 (ubuntu kernel)
> 
> This is the dmesg from using kernel 3.2.0-29

Can you please boot again with drm.debug=6 added to your kernel cmdline, and then grab the dmesg&attach it? We need that debug output to have an idea of what might be going on.
Comment 5 Daniel Vetter 2012-11-25 09:09:56 UTC
Also, getting yourself up to speed with compiling&installing a kernel from source would be good, so that we can test patches (or if there's no clue in dmesg) that you can attempt a bisect.
Comment 6 Kees Bakker 2012-11-25 13:38:57 UTC
(In reply to comment #5)
> Also, getting yourself up to speed with compiling&installing a kernel from
> source would be good, so that we can test patches (or if there's no clue in
> dmesg) that you can attempt a bisect.

Don't worry, I have been building kernels since 0.99pl15 :-) It doesn't make me an expert, but I think I know what I'm doing (emphasis on "think").
Comment 7 Kees Bakker 2012-11-25 13:57:07 UTC
Created attachment 70540 [details]
dmesg running 3.2.0-29 (ubuntu kernel), retry

I'm not sure what I did last night, but this time I have a lot more drm.debug messages. (The previous dmesg showed that I _did_ add the commandline option, but somehow it did not work.)
Comment 8 Kees Bakker 2012-11-25 14:01:59 UTC
Before I forget, thanks for taking the time so far.

And about dmesg for 3.3 or newer kernels, well, I wish I could. But none of these kernels give me a working network, nor a working keyboard. And also, booting from a USB stick isn't possible either. It's an Apple, need I say more?
Comment 9 Daniel Vetter 2012-11-25 15:54:29 UTC
Since keyboard nor network work for you on more recent kernels, I think we need to get that into working order first (since without that there's not much we can do). Can you please attempt to bisect where those regressions have been introduced?

Howto for bisecting https://wiki.ubuntu.com/Kernel/KernelBisection

If you can, try to boot with drm.debug=6 on a 3.5 kernel and then stitch the dmesg together from the logfiles in /var/log, maybe that gives us an idea as to what's going on.
Comment 10 Kees Bakker 2012-11-25 19:29:21 UTC
(In reply to comment #9)
> Since keyboard nor network work for you on more recent kernels, I think we
> need to get that into working order first (since without that there's not
> much we can do). Can you please attempt to bisect where those regressions
> have been introduced?
> 
> Howto for bisecting https://wiki.ubuntu.com/Kernel/KernelBisection
> 
> If you can, try to boot with drm.debug=6 on a 3.5 kernel and then stitch the
> dmesg together from the logfiles in /var/log, maybe that gives us an idea as
> to what's going on.

Yeah, well OK. You got me :-) It's gonna be a challenge, but I will give it a shot. If it's OK with you I first want to bisect between 3.2.0 and 3.3.8 since the latter isn't working either. (( To bad kernel.ubuntu.com is down at the moment and I need their git to get a good starting point. ))
Comment 11 Kees Bakker 2012-11-27 21:15:55 UTC
Created attachment 70690 [details]
dmesg running 3.5.0-19.30 (ubuntu kernel)

Here is the dmesg of 3.5.0

That means I can boot into a 3.5.0 kernel with a working network (and keyboard). I have no idea what happened with my previous (failed) attempts to boot 3.5.0

However, it does not mean that this bug report can be closed, because the system will crash (there is an OOPS) as soon as I log into Unity. XFCE seems to be working though.

Let me dig into it further and I report my progress in the comments. I'll see if I can get it to OOPS again.
Comment 12 Kees Bakker 2012-11-27 21:29:54 UTC
Created attachment 70692 [details]
/var/log/kern.log with the oops when starting Unity on 3.5.0

Creating the oops was not difficult at all. The steps to produce are as follows:
* boot system with 3.5.0
* observe that login screen appears (system still OK)
* login into Unity
* screen goes black

oops messages see /var/log/kern.log
Comment 13 Chris Wilson 2012-11-28 23:53:00 UTC
Note that prior to the OOPS, is a GPU hang. So please attach /sys/kernel/debug/dri/0/i915_error_state. You will also want linux-3.7 and xserver-xorg-video-intel-2.20.14 to fix a few known hangs that could affect login.
Comment 14 Kees Bakker 2012-11-30 19:40:02 UTC
Created attachment 70842 [details]
i915_error_state after the oops

I'm not sure i915_error_state is different before and after the oops. Anyway, this is from after the the oops.
Comment 15 Kees Bakker 2012-11-30 19:40:49 UTC
(In reply to comment #13)
> Note that prior to the OOPS, is a GPU hang. So please attach
> /sys/kernel/debug/dri/0/i915_error_state. You will also want linux-3.7 and
> xserver-xorg-video-intel-2.20.14 to fix a few known hangs that could affect
> login.

Thanks for the note. I'll give it a try.
Comment 16 Chris Wilson 2012-12-01 00:10:35 UTC
Ok, that hang doesn't look to be connected, though it should be fixed in recent kernels. The oops is most likely to be fixed with an updated ddx/kernel, though there is a lingering issue to be fixed soon...
Comment 17 Kees Bakker 2012-12-03 20:50:38 UTC
ddx => Device Dependent X (DDX) driver, in this case xserver-xorg-video-intel, right?
Comment 18 Daniel Vetter 2012-12-03 20:59:32 UTC
(In reply to comment #17)
> ddx => Device Dependent X (DDX) driver, in this case
> xserver-xorg-video-intel, right?

Yep
Comment 19 Kees Bakker 2012-12-06 21:36:40 UTC
Created attachment 71099 [details]
kern.log with 3.7.0-rc8 gives GPU hung

So I have installed 3.7.0-rc8 (from Ubuntu's kernel PPA) and the xorg packages from Ubuntu's xorg-edgers PPA.

The good news is that the system does not die anymore, no oopses either.

The bad news is that logging into Unity gives nothing but an empty desktop. It just sits there. With the mouse I can move the cursor, but I can't do anything.

If I start a gnome-terminal using a remote login the following happens: the gnome-terminal window is shown (no prompt), and after just a few second the screen gets corrupted. In /var/log/kern.log I see

Dec  6 22:22:20 makkie kernel: [  330.352121] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Dec  6 22:22:20 makkie kernel: [  330.352221] [drm:i915_error_work_func], resetting chip
Dec  6 22:22:20 makkie kernel: [  330.352541] [drm:i915_reset] *ERROR* Failed to reset chip.

(( I'll add i915_error_state in a moment. ))
Comment 20 Kees Bakker 2012-12-06 21:37:17 UTC
Created attachment 71100 [details]
i915_error_state after "GPU hung"
Comment 21 Kees Bakker 2012-12-06 21:39:29 UTC
(In reply to comment #19)
> 
> So I have installed 3.7.0-rc8 (from Ubuntu's kernel PPA) and the xorg
> packages from Ubuntu's xorg-edgers PPA.

That's xserver-xorg-video-intel 2:2.20.15+git20121203.6f675eea-0ubuntu0sarvatt~quantal
Comment 22 Chris Wilson 2012-12-06 21:51:01 UTC
Please apply

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_d
index f8ee3d1..9d85779 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -8830,6 +8830,8 @@ void intel_modeset_init(struct drm_device *dev)
        /* Just disable it once at startup */
        i915_disable_vga(dev);
        intel_setup_outputs(dev);
+
+       drm_helper_disable_unused_functions(dev);
 }
 
 static void


to prevent the earlier hang during module load (danvet!!!!) and reattach the error-state from launching unity.
Comment 23 Daniel Vetter 2012-12-06 21:59:21 UTC
Created attachment 71101 [details] [review]
again disable all outpus on driver load

Please test with this patch instead of Chris', this should work with the new modeset code.
Comment 24 Jesse Barnes 2012-12-11 18:40:11 UTC
Any luck Kees?  did Daniel's patch help at all with current kernels?
Comment 25 Kees Bakker 2012-12-11 20:00:59 UTC
(In reply to comment #24)
> Any luck Kees?  did Daniel's patch help at all with current kernels?

No, I just built a 3.7.0-6.14 kernel (which is the latest in ubuntu-raring), and now I wanted to apply the patch but it doesn't apply. I'll have to lookup what Daniel means by "the new modeset code".

To summarize my latest findings, the kernel does not oops anymore, but the graphics environment does not like 3D (only XFCE and Gnome Classic No Effects work).
Comment 26 Daniel Vetter 2012-12-11 20:07:42 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > Any luck Kees?  did Daniel's patch help at all with current kernels?
> 
> No, I just built a 3.7.0-6.14 kernel (which is the latest in ubuntu-raring),
> and now I wanted to apply the patch but it doesn't apply. I'll have to
> lookup what Daniel means by "the new modeset code".

Might be a conflict, just add the loop at the end of the modeset_init functions like the patch does. Patch should apply though on the latest drm-intel-fixes branch from

http://cgit.freedesktop.org/~danvet/drm-intel

> To summarize my latest findings, the kernel does not oops anymore, but the
> graphics environment does not like 3D (only XFCE and Gnome Classic No
> Effects work).

That's expected - i945gme is simply too old for glsl and iirc latest unity requires that now. Or do you mean that 3D in general is busted?
Comment 27 Kees Bakker 2012-12-11 20:23:31 UTC
(In reply to comment #26)
> (In reply to comment #25)
> > (In reply to comment #24)
> > > Any luck Kees?  did Daniel's patch help at all with current kernels?
> > 
> > No, I just built a 3.7.0-6.14 kernel (which is the latest in ubuntu-raring),
> > and now I wanted to apply the patch but it doesn't apply. I'll have to
> > lookup what Daniel means by "the new modeset code".
> 
> Might be a conflict, just add the loop at the end of the modeset_init
> functions like the patch does. Patch should apply though on the latest
> drm-intel-fixes branch from
> 
> http://cgit.freedesktop.org/~danvet/drm-intel

No it doesn't. But see below...

> 
> > To summarize my latest findings, the kernel does not oops anymore, but the
> > graphics environment does not like 3D (only XFCE and Gnome Classic No
> > Effects work).
> 
> That's expected - i945gme is simply too old for glsl and iirc latest unity
> requires that now. Or do you mean that 3D in general is busted?

No, 3D is not in general, just on my Macbook. By now I know that this hardware simply can't do the 3D stuff. It never could. I was more or less expecting (hoping) that the system would _tell_ me that I can't do 3D, instead of screen corruption and such.

And indeed, the Ubuntu folks decided that Unity is 3D only. Thanks guys :-(

Now that we know all this, AND the kernel is behaving much better, shouldn't we close this bug? Is it worth the effort to try the patch, and what problem (besides 3D) were we trying to solve?
Comment 28 Daniel Vetter 2012-12-11 20:33:14 UTC
Created attachment 71357 [details] [review]
rebased patch on top of 3.7/drm-intel-fixes

Indeed, I've mixed up my branches, please try this patch here.
Comment 29 Daniel Vetter 2012-12-11 20:35:13 UTC
(In reply to comment #27)
> No, 3D is not in general, just on my Macbook. By now I know that this
> hardware simply can't do the 3D stuff. It never could. I was more or less
> expecting (hoping) that the system would _tell_ me that I can't do 3D,
> instead of screen corruption and such.

Nah, i945gm should have 3D. At least it works here - but first we need to retest with the above patch applied to ensure your gpu is still alive when you boot into the desktop. Please check dmesg for any "gpu hang" messages or other issues, and if there's nothing check glxinfo and whether glxgears works.
Comment 30 Daniel Vetter 2012-12-11 20:36:34 UTC
(In reply to comment #27)
> Now that we know all this, AND the kernel is behaving much better, shouldn't
> we close this bug? Is it worth the effort to try the patch, and what problem
> (besides 3D) were we trying to solve?

Yours. So as long as you're willing to stick around, we're willing to dig into things. No promises though, since some bugs are really hard (but yours here doesn't yet look like it falls into that category).
Comment 31 Kees Bakker 2012-12-11 20:42:13 UTC
(In reply to comment #30)
> (In reply to comment #27)
> > Now that we know all this, AND the kernel is behaving much better, shouldn't
> > we close this bug? Is it worth the effort to try the patch, and what problem
> > (besides 3D) were we trying to solve?
> 
> Yours. So as long as you're willing to stick around, we're willing to dig
> into things. No promises though, since some bugs are really hard (but yours
> here doesn't yet look like it falls into that category).

Mine? Ah great :-) In that case I'll stick around. The patch applied, and now I have to rebuild the kernel packages. This takes a while on that Macbook ...
Comment 32 Kees Bakker 2012-12-13 19:56:18 UTC
(In reply to comment #29)
> (In reply to comment #27)
> > No, 3D is not in general, just on my Macbook. By now I know that this
> > hardware simply can't do the 3D stuff. It never could. I was more or less
> > expecting (hoping) that the system would _tell_ me that I can't do 3D,
> > instead of screen corruption and such.
> 
> Nah, i945gm should have 3D. At least it works here - but first we need to
> retest with the above patch applied to ensure your gpu is still alive when
> you boot into the desktop. Please check dmesg for any "gpu hang" messages or
> other issues, and if there's nothing check glxinfo and whether glxgears
> works.

Patch applied, and kernel built. glxgears is working. Here is some logging of that

kees@makkie:~$ glxgears 
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
117 frames in 5.0 seconds = 23.358 FPS
183 frames in 5.0 seconds = 36.554 FPS
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 1033 requests (1033 known processed) with 0 events remaining.
kees@makkie:~$ glxinfo |grep -i render
direct rendering: Yes
OpenGL renderer string: Mesa DRI Intel(R) 945GM x86/MMX/SSE2

Is that enough info for you about that patch?
Comment 33 Chris Wilson 2012-12-13 20:44:00 UTC
(In reply to comment #32)
> kees@makkie:~$ glxgears 
> Running synchronized to the vertical refresh.  The framerate should be
> approximately the same as the monitor refresh rate.
> 117 frames in 5.0 seconds = 23.358 FPS
> 183 frames in 5.0 seconds = 36.554 FPS
> XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
>       after 1033 requests (1033 known processed) with 0 events remaining.
> kees@makkie:~$ glxinfo |grep -i render
> direct rendering: Yes
> OpenGL renderer string: Mesa DRI Intel(R) 945GM x86/MMX/SSE2

That's abysmal level of performance. What size is the glxgears window and how large are the gears? What happens without unity?
Comment 34 Kees Bakker 2012-12-13 21:55:18 UTC
(In reply to comment #33)
> (In reply to comment #32)
> > kees@makkie:~$ glxgears 
> > Running synchronized to the vertical refresh.  The framerate should be
> > approximately the same as the monitor refresh rate.
> > 117 frames in 5.0 seconds = 23.358 FPS
> > 183 frames in 5.0 seconds = 36.554 FPS
> > XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
> >       after 1033 requests (1033 known processed) with 0 events remaining.
> > kees@makkie:~$ glxinfo |grep -i render
> > direct rendering: Yes
> > OpenGL renderer string: Mesa DRI Intel(R) 945GM x86/MMX/SSE2
> 
> That's abysmal level of performance. What size is the glxgears window and
> how large are the gears? What happens without unity?

The size of the window doesn't matter. Even if I fill the whole screen. The funny thing it that it is really smooth if I keep moving the cursor (or even the whole window), at 50 fps. But as soon as I leave the cursor or keyboard untouched it becomes jerky.

This is with Gnome Shell. Unity got removed in one of my recent upgrades because I am using xorg-edgers packages. And now unity can't be installed because of missing dependencies.

I've tried "gnome classic (no effects)" and that gave the same glxgears performance. (If that is what you wanted to know.)
Comment 35 Chris Wilson 2012-12-13 22:10:09 UTC
Oh, its cpuidle/cpufreq. Set your cpu_governor to performance.
Comment 36 Kees Bakker 2012-12-13 22:36:30 UTC
Created attachment 71473 [details]
kern.log with WARNING at intel_display.c:3588

This is the latest kern.log. I just happened to see the WARNINGs. I'm note sure at what stage these warnings were produced. If you want more details about it, just let me know.
Comment 37 Daniel Vetter 2013-01-14 17:56:32 UTC
Do you still see the WARNs on latest 3.8-rc?

The slowness without wiggling mouse is bug #30364

Are there any other issues left from the original regression?
Comment 38 Kees Bakker 2013-01-14 20:39:41 UTC
(In reply to comment #37)
> Do you still see the WARNs on latest 3.8-rc?

No, the WARNINGS are gone.

> 
> The slowness without wiggling mouse is bug #30364

Ah, OK

> 
> Are there any other issues left from the original regression?

No I think it is fine now. Thanks for looking into this.
Comment 39 Daniel Vetter 2013-01-14 20:51:10 UTC
Thanks for your update, I'll close this one. If something new pops up, please file a new bug - this one here is getting a bit messy.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.