Created attachment 75598 [details] Xorg log file I am not able to logout or shutdown my system, a laptop with hybrid graphics, without triggering a hard lockup. However, this does only happen if the dedicated AMD GPU is powered off by vgaswitcheroo. Moreover, it might be somehow related to PRIME support being enabled. The latest version of xserver-xorg-video-radeon for Ubuntu 13.04 is installed (1:7.1.0-0ubuntu1). $ lshw -C display *-display description: VGA compatible controller product: RV710 [Mobility Radeon HD 4300 Series] vendor: Advanced Micro Devices [AMD] nee ATI physical id: 0 bus info: pci@0000:01:00.0 version: 00 width: 32 bits clock: 33MHz capabilities: pm pciexpress msi vga_controller bus_master cap_list rom configuration: driver=radeon latency=0 resources: irq:46 memory:d0000000-dfffffff ioport:3000(size=256) memory:f4400000-f440ffff memory:f4420000-f443ffff *-display description: Display controller product: Mobile 4 Series Chipset Integrated Graphics Controller vendor: Intel Corporation physical id: 2 bus info: pci@0000:00:02.0 version: 07 width: 64 bits clock: 33MHz capabilities: msi pm bus_master cap_list rom configuration: driver=i915 latency=0 resources: irq:45 memory:f0000000-f03fffff memory:e0000000-efffffff ioport:4110(size=8) Here is syslog output of the bug: ------------------------------------------------------------------------- [ 142.230685] BUG: unable to handle kernel NULL pointer dereference at (null) [ 142.230819] IP: [<ffffffffa01f1ba5>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon] [ 142.230977] PGD 0 [ 142.231014] Oops: 0000 [#1] SMP [ 142.231075] Modules linked in: dm_crypt(F) kvm_intel kvm acer_wmi sparse_keymap snd_hda_codec_realtek xt_hl(F) ip6t_rt(F) snd_hda_intel snd_hda_codec snd_hwdep(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ipt_REJECT(F) microcode(F) xt_LOG(F) snd_pcm(F) xt_limit(F) xt_tcpudp(F) snd_page_alloc(F) xt_addrtype(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) arc4(F) psmouse(F) nf_conntrack_ipv4(F) serio_raw(F) nf_defrag_ipv4(F) xt_state(F) iwldvm snd_seq(F) mac80211 ip6table_filter(F) snd_seq_device(F) ip6_tables(F) snd_timer(F) iwlwifi lpc_ich nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) nf_nat_ftp(F) nf_nat(F) nf_conntrack_ftp(F) nf_conntrack(F) iptable_filter(F) cfg80211 ip_tables(F) joydev(F) x_tables(F) snd(F) soundcore(F) mac_hid binfmt_misc(F) coretemp lp(F) parport(F) hid_generic usbhid hid radeon i915 i2c_algo_bit ttm drm_kms_helper wmi r8169 ahci(F) drm libahci(F) video(F) [ 142.232175] CPU 0 [ 142.232175] Pid: 1135, comm: Xorg Tainted: GF 3.8.0-7-generic #15-Ubuntu Acer TravelMate 8471/TravelMate 8471 [ 142.232175] RIP: 0010:[<ffffffffa01f1ba5>] [<ffffffffa01f1ba5>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon] [ 142.232175] RSP: 0018:ffff88013752bc28 EFLAGS: 00010282 [ 142.232175] RAX: ffffc900047a2f34 RBX: 0000000000000000 RCX: 0000000000000000 [ 142.232175] RDX: 0000000000000000 RSI: 0000000000002f34 RDI: ffff8801359d6000 [ 142.232175] RBP: ffff88013752bc38 R08: 0000000000000000 R09: 0000000000000000 [ 142.232175] R10: ffffea0004d3de00 R11: ffffffffa001a448 R12: ffff8801359d6000 [ 142.232175] R13: 0000000000000225 R14: 0000000000000225 R15: ffffffffa025d560 [ 142.232175] FS: 00007fa8bea89940(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 [ 142.232175] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 142.232175] CR2: 0000000000000000 CR3: 0000000137882000 CR4: 00000000000407f0 [ 142.232175] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 142.232175] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 142.232175] Process Xorg (pid: 1135, threadinfo ffff88013752a000, task ffff88013750c5c0) [ 142.232175] Stack: [ 142.232175] ffff8801359d6000 0000000000000225 ffff88013752bc68 ffffffffa01c65d7 [ 142.232175] ffff880134f73080 0000000000000002 ffff8801359d69c0 ffff8801397e2848 [ 142.232175] ffff88013752bc78 ffffffffa01c3baa ffff88013752bc90 ffffffffa0099087 [ 142.232175] Call Trace: [ 142.232175] [<ffffffffa01c65d7>] radeon_gart_unbind+0xa7/0xe0 [radeon] [ 142.232175] [<ffffffffa01c3baa>] radeon_ttm_backend_unbind+0x1a/0x20 [radeon] [ 142.232175] [<ffffffffa0099087>] ttm_tt_unbind+0x27/0x40 [ttm] [ 142.232175] [<ffffffffa0099693>] ttm_bo_cleanup_memtype_use+0x33/0x90 [ttm] [ 142.232175] [<ffffffffa009a930>] ttm_bo_release+0x210/0x280 [ttm] [ 142.232175] [<ffffffffa009a9d1>] ttm_bo_unref+0x31/0x40 [ttm] [ 142.232175] [<ffffffffa01c5407>] radeon_bo_unref+0x47/0x80 [radeon] [ 142.232175] [<ffffffffa01d7cf9>] radeon_gem_object_free+0x39/0x40 [radeon] [ 142.232175] [<ffffffffa0010aba>] drm_gem_object_free+0x2a/0x30 [drm] [ 142.232175] [<ffffffffa00111e8>] drm_gem_handle_delete+0xf8/0x130 [drm] [ 142.232175] [<ffffffffa0011648>] drm_gem_close_ioctl+0x28/0x30 [drm] [ 142.232175] [<ffffffffa000f559>] drm_ioctl+0x4e9/0x5b0 [drm] [ 142.232175] [<ffffffffa0011620>] ? drm_gem_destroy+0x60/0x60 [drm] [ 142.232175] [<ffffffff8115c14b>] ? unmap_region+0xdb/0x120 [ 142.232175] [<ffffffff8115c453>] ? remove_vma+0x63/0x70 [ 142.232175] [<ffffffff811a5059>] do_vfs_ioctl+0x99/0x570 [ 142.232175] [<ffffffff8115e488>] ? do_munmap+0x328/0x410 [ 142.232175] [<ffffffff811a55c1>] sys_ioctl+0x91/0xb0 [ 142.232175] [<ffffffff816cc5dd>] system_call_fastpath+0x1a/0x1f [ 142.232175] Code: 00 c1 e8 04 83 f8 02 74 29 85 c0 74 c9 5b 41 5c 5d c3 0f 1f 40 00 31 c9 31 d2 be 34 2f 00 00 48 8b 9f 90 03 00 00 e8 5b f1 fe ff <8b> 03 e9 42 ff ff ff 48 c7 c7 10 b3 24 a0 31 c0 e8 a0 5c 4c e1 [ 142.232175] RIP [<ffffffffa01f1ba5>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon] [ 142.232175] RSP <ffff88013752bc28> [ 142.232175] CR2: 0000000000000000 [ 142.294959] ---[ end trace aabd94dad6d98857 ]--- -------------------------------------------------------------------------
A commit in 3.17-rc6 is causing this kernel panic to occur when switching off the dedicated GPU for the first time after booting the system. This is still reproducible with 4.1-rcX and radeon.runpm=0 (plus radeon.dpm=0).
Created attachment 116867 [details] [review] drm/radeon: Don't flush the GART TLB if rdev->gart.ptr == NULL Does this patch fix the problem?
I will test your patch as soon as possible. Meanwhile, I just finished bisecting the kernel with the result: b440bde74f043c8ec31081cb59c9a53ade954701 is the first bad commit
Created attachment 116881 [details] dmesg snippet I have applied radeon-gart_tlb_flush-NULL.diff to git master and it does fix the hard lockup. However, powering off the dedicated GPU still triggers some kernel panics (see attached log file).
You are probably seeing this bug (bug is in the pci hotplug system): https://bugzilla.kernel.org/show_bug.cgi?id=61891 See if the latest patch there helps.
Alex, do you mean the patch from comment #83? The previous one from comment #78 was already applied -> 0824965140fff1bf640a987dc790d1594a8e0699.
(In reply to Thaddaeus Tintenfisch from comment #6) > Alex, do you mean the patch from comment #83? > The previous one from comment #78 was already applied -> > 0824965140fff1bf640a987dc790d1594a8e0699. I didn't realize it had already been applied.
The warnings (not panics) would need to be tracked in separate reports, but at least the first one is harmless.
The first two warnings are pretty much identical (3 warnings in total). Should I create two new reports which reference the patch from this report? The main issue here is that the vgaswitcheroo interface is no longer available after powering off the dGPU (/sys/kernel/debug/vgaswitcheroo/switch is gone). From the log: [ 60.462677] vga_switcheroo: disabled
(In reply to Thaddaeus Tintenfisch from comment #9) > The first two warnings are pretty much identical (3 warnings in total). > Should I create two new reports which reference the patch from this report? > > The main issue here is that the vgaswitcheroo interface is no longer > available after powering off the dGPU > (/sys/kernel/debug/vgaswitcheroo/switch is gone). > > From the log: > > [ 60.462677] vga_switcheroo: disabled It's still an acpi hotplug bug: [ 60.454402] [<ffffffffc022c295>] radeon_pci_remove+0x15/0x20 [radeon] [ 60.454407] [<ffffffff813e470f>] pci_device_remove+0x3f/0xc0 [ 60.454414] [<ffffffff814e6a86>] __device_release_driver+0x96/0x130 [ 60.454418] [<ffffffff814e6b43>] device_release_driver+0x23/0x30 [ 60.454423] [<ffffffff813df012>] pci_stop_bus_device+0x92/0xa0 [ 60.454427] [<ffffffff813df136>] pci_stop_and_remove_bus_device+0x16/0x30 [ 60.454432] [<ffffffff813fbf23>] disable_slot+0x53/0xa0 [ 60.454436] [<ffffffff813fc642>] acpiphp_check_bridge.part.8+0xd2/0xf0 [ 60.454440] [<ffffffff813fcec2>] acpiphp_hotplug_notify+0xd2/0x220 [ 60.454445] [<ffffffff813fcdf0>] ? acpiphp_post_dock_fixup+0xc0/0xc0 [ 60.454450] [<ffffffff81429fb7>] acpi_device_hotplug+0x3b0/0x3f8 [ 60.454454] [<ffffffff814233ae>] acpi_hotplug_work_fn+0x1f/0x2b The acpiphp driver is trying to remove the driver after switcheroo has turned it off. It should not not kicking in and removing the driver. It looks like there is some other broken case in the acpiphp code. I'd suggest filing a new acpiphp bug on bugzilla.kernel.org and referencing this bug: https://bugzilla.kernel.org/show_bug.cgi?id=61891 or adding a comment to that bug that there are still cases that are broken.
I have added a comment to the linked report. Also, the patch from comment #2 for this bug can be forwarded then. Thanks.
Is anything missing? Does the patch need more testing?
Oh, 4.2-rc3 includes the patch. Thanks. http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=233709d2cd6bbaaeda0aeb8d11f6ca7f98563b39
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.