Bug 70895 - BUG: soft lockup - CPU#0 stuck for 22s! [Xorg:1292] after disable descrete card with switcheroo
Summary: BUG: soft lockup - CPU#0 stuck for 22s! [Xorg:1292] after disable descrete ca...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-26 15:10 UTC by Marcin Zajaczkowski
Modified: 2014-11-21 00:05 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
kernel log with stack trace (48.07 KB, text/plain)
2013-10-26 15:10 UTC, Marcin Zajaczkowski
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcin Zajaczkowski 2013-10-26 15:10:17 UTC
Created attachment 88154 [details]
kernel log with stack trace

After disable discrete graphics with:
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

nouveau drivers seems to react:
kernel: [  113.072192] VGA switcheroo: switched nouveau off
Oct 26 16:07:52 localhost kernel: [  113.072208] ACPI Warning: \_SB_.PCI0.PEGR.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
kernel: [  113.072350] nouveau  [     DRM] suspending fbcon...
kernel: [  113.072354] nouveau  [     DRM] suspending display...
kernel: [  113.072360] nouveau  [     DRM] unpinning framebuffer(s)...
kernel: [  113.072400] nouveau  [     DRM] evicting buffers...
kernel: [  113.085655] nouveau  [     DRM] waiting for kernel channels to go idle...
kernel: [  113.085682] nouveau  [     DRM] suspending client object trees...
kernel: [  113.085976] nouveau  [     DRM] suspending kernel object tree...
kernel: [  113.865315] nouveau 0000:01:00.0: power state changed by ACPI to D3cold

but xrandr --listproviders hangs the whole system with:
kernel: [  324.008068] BUG: soft lockup - CPU#0 stuck for 22s! [Xorg:1292]

Detailed log with stack trace attached.

System specification:
Asus N43SN
GeForce GT 550M + integrated Intel card using i915 driver (NVidia Optimus)
kernel-3.11.6-200.fc19.x86_64
xorg-x11-drv-nouveau-1.0.9-1.fc19.x86_64
xorg-x11-server-Xorg-1.14.3-1.fc19.x86_64

$ xrandr --listproviders
Providers: number : 3
Provider 0: id: 0x8e cap: 0xb, Source Output, Sink Output, Sink Offload crtcs: 2 outputs: 4 associated providers: 2 name:Intel
Provider 1: id: 0x63 cap: 0x7, Source Output, Sink Output, Source Offload crtcs: 2 outputs: 2 associated providers: 2 name:nouveau
Provider 2: id: 0x63 cap: 0x7, Source Output, Sink Output, Source Offload crtcs: 2 outputs: 2 associated providers: 2 name:nouveau
Comment 1 Emil Velikov 2014-01-13 02:48:05 UTC
AFAICS this is the same issue as the one reported in the kernel bugzilla [1]. Can you try the patch [2] and let us know if you're still having problems.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=64891
[2] https://patchwork.kernel.org/patch/3416861/
Comment 2 Marcin Zajaczkowski 2014-11-20 23:31:52 UTC
Hello again. I upgraded my system to Fedora 21 and with 3.17.3-300.fc21.x86_64 I cannot reproduce mentioned issue (with "CPU#0 stuck for 22s").

Unfortunately I have problem with switching a discrete card. By default:
# echo OFF > /sys/kernel/debug/vgaswitcheroo/switch
does nothing - a discrete card is still in DynPwr state:

$ $ sudo cat /sys/kernel/debug/vgaswitcheroo/switch
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :DynPwr:0000:01:00.0

Booting with nouveau.runpm=0 I have both cards on:
$ $ sudo cat /sys/kernel/debug/vgaswitcheroo/switch
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :Pwr:0000:01:00.0

and after
# echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

I have:
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :Off:0000:01:00.0

Looks ok, but I still see 3 providers:
$ xrandr --listproviders
Providers: number : 3
Provider 0: id: 0x90 cap: 0xb, Source Output, Sink Output, Sink Offload crtcs: 3 outputs: 5 associated providers: 2 name:Intel
Provider 1: id: 0x63 cap: 0x7, Source Output, Sink Output, Source Offload crtcs: 2 outputs: 2 associated providers: 2 name:nouveau
Provider 2: id: 0x63 cap: 0x7, Source Output, Sink Output, Source Offload crtcs: 2 outputs: 2 associated providers: 2 name:nouveau

and system temperature could suggest that the card is still on. I see no messages from VGA switcheroo in the system log (as there were in 3.11.6).

When I use (on the second kernel) bbswitch with bumblebee I see only one provider and the temperature is a few (~6) degrees lower.

Should I close that issue and report another issue?
Comment 3 Ilia Mirkin 2014-11-20 23:41:03 UTC
(In reply to Marcin Zajaczkowski from comment #2)
> Hello again. I upgraded my system to Fedora 21 and with
> 3.17.3-300.fc21.x86_64 I cannot reproduce mentioned issue (with "CPU#0 stuck
> for 22s").
> 
> Unfortunately I have problem with switching a discrete card. By default:
> # echo OFF > /sys/kernel/debug/vgaswitcheroo/switch
> does nothing - a discrete card is still in DynPwr state:

That's expected. What's unexpected is that something is keeping a reference to your GPU which isn't letting it power off. Similar to bug #70875.

> Booting with nouveau.runpm=0 I have both cards on:
> $ $ sudo cat /sys/kernel/debug/vgaswitcheroo/switch
> 0:IGD:+:Pwr:0000:00:02.0
> 1:DIS: :Pwr:0000:01:00.0
> 
> and after
> # echo OFF > /sys/kernel/debug/vgaswitcheroo/switch
> 
> I have:
> 0:IGD:+:Pwr:0000:00:02.0
> 1:DIS: :Off:0000:01:00.0
> 
> Looks ok, but I still see 3 providers:
> $ xrandr --listproviders
> Providers: number : 3
> Provider 0: id: 0x90 cap: 0xb, Source Output, Sink Output, Sink Offload
> crtcs: 3 outputs: 5 associated providers: 2 name:Intel
> Provider 1: id: 0x63 cap: 0x7, Source Output, Sink Output, Source Offload
> crtcs: 2 outputs: 2 associated providers: 2 name:nouveau
> Provider 2: id: 0x63 cap: 0x7, Source Output, Sink Output, Source Offload
> crtcs: 2 outputs: 2 associated providers: 2 name:nouveau
> 
> and system temperature could suggest that the card is still on. I see no
> messages from VGA switcheroo in the system log (as there were in 3.11.6).

That is odd... not sure what that situation is.

> Should I close that issue and report another issue?

Yes, don't confuse multiple issues into one, it gets unwieldy.
Comment 4 Marcin Zajaczkowski 2014-11-21 00:05:37 UTC
Closing as not reproducible with 3.17.3-300.fc21.x86_64.

> That's expected. What's unexpected is that something is keeping a reference
> to your GPU which isn't letting it power off. Similar to bug #70875.

That's also a bug reported by me :).

I created bug 86503 with the problem with keeping the card On when reporting as Off.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.