Bug 70927

Summary: [NVE7] kernel panic in nv50_instobj_wr32 after switcheroo cycle
Product: Mesa Reporter: Antonio Vázquez Blanco <antoniovazquezblanco>
Component: Drivers/DRI/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: medium CC: antoniovazquezblanco
Version: 9.2   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Kernel panic log.
Kernel log vw
Full log

Description Antonio Vázquez Blanco 2013-10-27 18:02:42 UTC
Created attachment 88189 [details]
Kernel panic log.

I usually get kernel panics on my Alienware M14X R2.

Using Archlinux up to date.
Linux alienarch 3.11.6-1-ARCH #1 SMP PREEMPT Fri Oct 18 23:22:36 CEST 2013 x86_64 GNU/Linux

Nouveau is at 9.2.2

Kernel panic log attached.

What else can I do in order to help?

Thanks.
Comment 1 Antonio Vázquez Blanco 2013-10-27 18:48:49 UTC
Created attachment 88193 [details]
Kernel log vw

Updated log. Previous was not ok.
Comment 2 Ilia Mirkin 2013-10-27 19:34:11 UTC
From the code in the trace, looks like node->mem is somehow null. Can you supply a full dmesg?
Comment 3 Antonio Vázquez Blanco 2013-10-28 08:23:00 UTC
Created attachment 88210 [details]
Full log

Now having a look to the full log I can see a lot of other things that should be taken into account. Sorry for cutting the information.
Comment 4 Ilia Mirkin 2013-10-28 14:54:29 UTC
This can't be good. Happens after switcheroo turns the card off and it goes to D3cold. I suspect the crash is related to this.

------------[ cut here ]------------
WARNING: CPU: 6 PID: 401 at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xd0()
Watchdog detected hard LOCKUP on cpu 6
Modules linked in:
 joydev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_codec_ca0132 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul crc32c_intel snd_hda_intel ghash_clmulni_intel aesni_intel aes_x86_64 snd_hda_codec lrw gf128mul glue_helper arc4 dell_wmi snd_hwdep snd_pcm sparse_keymap snd_page_alloc ath9k ath9k_common ath9k_hw ath mac80211 nouveau cfg80211 mxm_wmi iTCO_wdt rfkill ttm psmouse iTCO_vendor_support snd_timer atl1c serio_raw rtsx_pci_ms memstick ablk_helper cryptd snd fan microcode thermal wmi mperf ac evdev shpchp processor pcspkr soundcore battery lpc_ich i2c_i801 mei_me mei ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod sd_mod cdrom rtsx_pci_sdmmc ahci libahci libata sdhci_pci sdhci
 ehci_pci xhci_hcd ehci_hcd scsi_mod mmc_core rtsx_pci usbcore usb_common i915 video button i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core
CPU: 6 PID: 401 Comm: Xorg Not tainted 3.11.6-1-ARCH #1
Hardware name: Alienware M14xR2/M14xR2, BIOS A10 06/29/2012
 0000000000000009 ffff88025f386c10 ffffffff814dba02 ffff88025f386c58
 ffff88025f386c48 ffffffff8106193d ffff880253688000 0000000000000000
 ffff88025f386d78 0000000000000000 ffff88025f386ef8 ffff88025f386ca8
Call Trace:
 <NMI>  [<ffffffff814dba02>] dump_stack+0x54/0x8d
 [<ffffffff8106193d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff810619ac>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff8101c665>] ? native_sched_clock+0x15/0x80
 [<ffffffff8101c6d9>] ? sched_clock+0x9/0x10
 [<ffffffff810e9950>] ? watchdog_enable_all_cpus.part.2+0x40/0x40
 [<ffffffff810e99ec>] watchdog_overflow_callback+0x9c/0xd0
 [<ffffffff8112962e>] __perf_event_overflow+0x8e/0x2b0
 [<ffffffff811284b7>] ? perf_event_update_userpage+0xe7/0x160
 [<ffffffff8112a1e4>] perf_event_overflow+0x14/0x20
 [<ffffffff8103072d>] intel_pmu_handle_irq+0x1bd/0x3c0
 [<ffffffff814e489b>] perf_event_nmi_handler+0x2b/0x50
 [<ffffffff814e3ea1>] nmi_handle.isra.3+0xa1/0x1d0
 [<ffffffff814e4139>] do_nmi+0x169/0x340
 [<ffffffff814e34f1>] end_repeat_nmi+0x1e/0x2e
 [<ffffffff81298a12>] ? ioread32+0x42/0x50
 [<ffffffff81298a12>] ? ioread32+0x42/0x50
 [<ffffffff81298a12>] ? ioread32+0x42/0x50
 <<EOE>>  [<ffffffffa078d7cb>] ? nv04_timer_read+0x3b/0x70 [nouveau]
 [<ffffffffa078d574>] nouveau_timer_wait_eq+0x74/0xd0 [nouveau]
 [<ffffffffa076f362>] nv84_bar_flush+0x52/0x90 [nouveau]
 [<ffffffffa0790892>] nvc0_vm_flush+0x42/0x1a0 [nouveau]
 [<ffffffffa079061c>] ? nvc0_vm_map+0xfc/0x110 [nouveau]
 [<ffffffffa078e1c5>] nouveau_vm_map_at+0x165/0x1d0 [nouveau]
 [<ffffffffa078e243>] nouveau_vm_map+0x13/0x20 [nouveau]
 [<ffffffffa07cb09c>] nouveau_bo_move_ntfy+0xbc/0xd0 [nouveau]
 [<ffffffffa06b0f1e>] ttm_bo_handle_move_mem+0x20e/0x5c0 [ttm]
 [<ffffffffa06b19b9>] ? ttm_bo_mem_space+0x179/0x360 [ttm]
 [<ffffffffa06b1f97>] ttm_bo_move_buffer+0x117/0x130 [ttm]
 [<ffffffff8120364d>] ? proc_alloc_inode+0x1d/0xb0
 [<ffffffffa06b203a>] ttm_bo_validate+0x8a/0x100 [ttm]
 [<ffffffffa07cc5cc>] nouveau_bo_validate+0x1c/0x20 [nouveau]
 [<ffffffffa07ce159>] validate_list+0x69/0x310 [nouveau]
 [<ffffffffa07cf4ca>] nouveau_gem_ioctl_pushbuf+0x9aa/0x1560 [nouveau]
 [<ffffffff814df7ce>] ? mutex_unlock+0xe/0x10
 [<ffffffffa00111a2>] drm_ioctl+0x532/0x660 [drm]
 [<ffffffff81072aa7>] ? kill_pid_info+0x47/0x60
 [<ffffffff811b1c05>] do_vfs_ioctl+0x2e5/0x4d0
 [<ffffffff810711a2>] ? __set_task_blocked+0x32/0x70
 [<ffffffff811a15ee>] ? ____fput+0xe/0x10
 [<ffffffff811b1e71>] SyS_ioctl+0x81/0xa0
 [<ffffffff814e665e>] ? do_page_fault+0xe/0x10
 [<ffffffff814ea5dd>] system_call_fastpath+0x1a/0x1f
---[ end trace 7fcf10949e51422c ]---


Then, when turning the card back on,

nouveau E[      VM][0000:01:00.0] vm timeout 1: 0xbadf1200 1

Which probably leaves the vm uninitialized (?), and the BUG which happens due to node->mem being NULL:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
IP: [<ffffffffa07887ab>] nv50_instobj_wr32+0x2b/0xc0 [nouveau]
PGD 24b816067 PUD 2524ef067 PMD 0 
Oops: 0000 [#1] PREEMPT SMP 
Modules linked in: joydev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_codec_ca0132 x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crc32_pclmul crc32c_intel snd_hda_intel ghash_clmulni_intel aesni_intel aes_x86_64 snd_hda_codec lrw gf128mul glue_helper arc4 dell_wmi snd_hwdep snd_pcm sparse_keymap snd_page_alloc ath9k ath9k_common ath9k_hw ath mac80211 nouveau cfg80211 mxm_wmi iTCO_wdt rfkill ttm psmouse iTCO_vendor_support snd_timer atl1c serio_raw rtsx_pci_ms memstick ablk_helper cryptd snd fan microcode thermal wmi mperf ac evdev shpchp processor pcspkr soundcore battery lpc_ich i2c_i801 mei_me mei ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod sd_mod cdrom rtsx_pci_sdmmc ahci libahci libata sdhci_pci
 sdhci ehci_pci xhci_hcd ehci_hcd scsi_mod mmc_core rtsx_pci usbcore usb_common i915 video button i2c_algo_bit intel_agp intel_gtt drm_kms_helper drm i2c_core [last unloaded: coretemp]
CPU: 3 PID: 375 Comm: bumblebeed Tainted: G        W    3.11.6-1-ARCH #1
Hardware name: Alienware M14xR2/M14xR2, BIOS A10 06/29/2012
task: ffff88024f82a1c0 ti: ffff88025251a000 task.ti: ffff88025251a000
RIP: 0010:[<ffffffffa07887ab>]  [<ffffffffa07887ab>] nv50_instobj_wr32+0x2b/0xc0 [nouveau]
RSP: 0018:ffff88025251bc60  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff880252d8d900 RCX: ffffffffa0814ac0
RDX: 00000000ffeefeff RSI: 0000000000000000 RDI: ffff88024ee58060
RBP: ffff88025251bc90 R08: 0000000000000000 R09: ffffffff8116b8ca
R10: ffff88025251bfd8 R11: 0000000000000001 R12: ffff88024ee58060
R13: 00000ffffff00000 R14: 00000000ffeefeff R15: 0000000000000000
FS:  00007fc9e4f01700(0000) GS:ffff88025f2c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000e0 CR3: 000000025230f000 CR4: 00000000001407e0
Stack:
 0000000000008000 0000000000000004 ffff88024ee58060 ffff880252d8d970
 ffff880252d8d920 0000000000000000 ffff88025251bcc0 ffffffffa0787f84
 ffff880252d8d900 0000000000000000 ffff88025354b000 0000000000000000
Call Trace:
 [<ffffffffa0787f84>] nouveau_instmem_init+0x84/0xc0 [nouveau]
 [<ffffffffa07880be>] _nouveau_instmem_init+0xe/0x10 [nouveau]
 [<ffffffffa076dffd>] nouveau_object_inc+0xbd/0x1b0 [nouveau]
 [<ffffffffa07937c5>] nouveau_device_init+0x25/0xa0 [nouveau]
 [<ffffffffa076dffd>] nouveau_object_inc+0xbd/0x1b0 [nouveau]
 [<ffffffffa076dfd7>] nouveau_object_inc+0x97/0x1b0 [nouveau]
 [<ffffffffa076c79b>] nouveau_handle_init+0x7b/0x230 [nouveau]
 [<ffffffffa076c831>] nouveau_handle_init+0x111/0x230 [nouveau]
 [<ffffffffa076b162>] nouveau_client_init+0x32/0x60 [nouveau]
 [<ffffffffa07c6744>] nouveau_do_resume+0x64/0x130 [nouveau]
 [<ffffffffa07c6870>] nouveau_pmops_resume+0x60/0x70 [nouveau]
 [<ffffffffa07c96c0>] nouveau_switcheroo_set_state+0x90/0xb0 [nouveau]
 [<ffffffff81371a95>] vga_switchon+0x35/0x50
 [<ffffffff81372328>] vga_switcheroo_debugfs_write+0x368/0x3b0
 [<ffffffff8119fafd>] vfs_write+0xbd/0x1e0
 [<ffffffff811a0559>] SyS_write+0x49/0xa0
 [<ffffffff814ea5dd>] system_call_fastpath+0x1a/0x1f
Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 89 d6 41 55 49 bd 00 00 f0 ff ff 0f 00 00 41 54 53 48 83 ec 08 48 8b 47 48 48 8b 5f 10 <48> 03 b0 e0 00 00 00 4c 8d a3 90 00 00 00 4c 89 e7 49 21 f5 81 
RIP  [<ffffffffa07887ab>] nv50_instobj_wr32+0x2b/0xc0 [nouveau]
 RSP <ffff88025251bc60>
Comment 5 Ilia Mirkin 2014-08-21 21:28:09 UTC
Can you check whether this still happens with recent kernels? With 3.13.x the card should automatically power on/off as needed.
Comment 6 Ilia Mirkin 2015-10-22 19:54:01 UTC
No response to retest request over a year ago

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.