Bug 71824

Summary: [NVE6] NULL deref on boot when there is nothing in DCB on 3.13-rc
Product: xorg Reporter: Guo Jinxian <jinxianx.guo>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: VERIFIED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, przanoni
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
lsmod info none

Description Guo Jinxian 2013-11-20 08:37:32 UTC
Created attachment 89516 [details]
dmesg

System Environment:
--------------------------
Platform:   HSW Mobile
Kernel: (drm-intel-testing)f400ddc64ab74ae754896138f1aacd4b4ad62def
Some additional commit info:
Merge: acd64af 2dead37
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Nov 15 11:03:30 2013 +0100

Detail description:
-------------------------

System Call Trace while system booting.

this bug unable to reproduce on -fixes(1d37b689b1c07c01101534e55ffcd43f69069355
) and -next-queued(f671d117bc0338b67b0a7485882d332fe6c4b570)

module nouveau was loaded and in use on -testing kernel, this module doesn't load on -fixes and -next-queued kernel, please check lsmod information in attachment.

Call Trace:
--------------------------
[    4.929052] Call Trace:
[    4.930196]  [<ffffffffa01a62ae>] nouveau_object_ctor+0x32/0xaf [nouveau]
[    4.931378]  [<ffffffffa01a6a30>] nouveau_object_new+0x115/0x1e3 [nouveau]
[    4.932528]  [<ffffffffa0213f87>] nouveau_drm_load+0x54f/0x787 [nouveau]
[    4.933668]  [<ffffffff8111ecfe>] ? kmem_cache_alloc_trace+0xb8/0x135
[    4.934752]  [<ffffffffa000a0b9>] ? drm_get_minor+0x19d/0x1e2 [drm]
[    4.935815]  [<ffffffffa000a201>] drm_dev_register+0x103/0x1b7 [drm]
[    4.936886]  [<ffffffffa000bee3>] drm_get_pci_dev+0xa5/0x12f [drm]
[    4.937932]  [<ffffffff8138646c>] ? __pci_set_master+0x2b/0x79
[    4.939010]  [<ffffffffa02136a6>] nouveau_drm_probe+0x1e7/0x20a [nouveau]
[    4.940057]  [<ffffffff81418a17>] ? driver_probe_device+0x1a3/0x1a3
[    4.941088]  [<ffffffff81389ee2>] local_pci_probe+0x20/0x32
[    4.942111]  [<ffffffff8138a980>] pci_device_probe+0xbf/0xe5
[    4.943113]  [<ffffffff8141890a>] driver_probe_device+0x96/0x1a3
[    4.944118]  [<ffffffff81418a79>] __driver_attach+0x62/0x85
[    4.945124]  [<ffffffff81416dd8>] bus_for_each_dev+0x5f/0x91
[    4.946132]  [<ffffffff8141848c>] driver_attach+0x1e/0x20
[    4.947147]  [<ffffffff81418035>] bus_add_driver+0xf9/0x245
[    4.948169]  [<ffffffff81419068>] driver_register+0x8c/0xc3
[    4.949190]  [<ffffffff8138aa74>] __pci_register_driver+0x61/0x65
[    4.950213]  [<ffffffffa0298000>] ? 0xffffffffa0297fff
[    4.951232]  [<ffffffffa000bff8>] drm_pci_init+0x8b/0xf1 [drm]
[    4.952245]  [<ffffffffa0298000>] ? 0xffffffffa0297fff
[    4.953258]  [<ffffffffa0298043>] nouveau_drm_init+0x43/0x45 [nouveau]
[    4.954281]  [<ffffffff81000271>] do_one_initcall+0x84/0x10f
[    4.955299]  [<ffffffff8105c3bd>] ? __blocking_notifier_call_chain+0x51/0x5f
[    4.956328]  [<ffffffff8109b2b9>] load_module+0x1adc/0x1de3
[    4.957347]  [<ffffffff810988cc>] ? __unlink_module+0x25/0x25
[    4.958383]  [<ffffffff8109b667>] SyS_init_module+0xa7/0xb6
[    4.959415]  [<ffffffff818036d2>] system_call_fastpath+0x16/0x1b
[    4.960458] Code: 89 44 24 10 c7 44 24 08 00 01 00 00 c7 04 24 00 00 00 00 e8 bd 1a f9 ff 48 8b 55 c8 85 c0 41 89 c5 49 89 14 24 0f 85 98 00 00 00 <49> 8b 87 30 01 00 00 be d0 80 00 00 48 63 b8 88 00 00 00 89 ba 
[    4.961670] RIP  [<ffffffffa021277e>] nv50_software_context_ctor+0x78/0x122 [nouveau]
[    4.962801]  RSP <ffff88023f775848>
[    4.963896] CR2: 0000000000000130
[    4.964991] ---[ end trace 3ca31f7e0cfdb7c0 ]---
Comment 1 Guo Jinxian 2013-11-20 08:38:24 UTC
Created attachment 89517 [details]
lsmod info
Comment 2 Daniel Vetter 2013-11-20 09:59:04 UTC
Can you please attach the output of lspci -nn?
Comment 3 Guo Jinxian 2013-11-21 01:12:59 UTC
Here is the information of PCI. Thanks.

[root@x-hswm24 ~]# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Haswell DRAM Controller [8086:0c04] (rev 06)
00:01.0 PCI bridge [0604]: Intel Corporation Haswell PCI Express x16 Controller [8086:0c01] (rev 06)
00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell Integrated Graphics Controller [8086:0416] (rev 06)
00:03.0 Audio device [0403]: Intel Corporation Haswell HD Audio Controller [8086:0c0c] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation Lynx Point USB xHCI Host Controller [8086:8c31] (rev 04)
00:16.0 Communication controller [0780]: Intel Corporation Lynx Point MEI Controller #1 [8086:8c3a] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation Lynx Point USB Enhanced Host Controller #2 [8086:8c2d] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation Lynx Point High Definition Audio Controller [8086:8c20] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #1 [8086:8c10] (rev d4)
00:1c.3 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #4 [8086:8c16] (rev d4)
00:1c.4 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #5 [8086:8c18] (rev d4)
00:1c.5 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #6 [8086:8c1a] (rev d4)
00:1d.0 USB controller [0c03]: Intel Corporation Lynx Point USB Enhanced Host Controller #1 [8086:8c26] (rev 04)
00:1f.0 ISA bridge [0601]: Intel Corporation Lynx Point LPC Controller [8086:8c4b] (rev 04)
00:1f.2 SATA controller [0106]: Intel Corporation Lynx Point 6-port SATA Controller 1 [AHCI mode] [8086:8c03] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation Lynx Point SMBus Controller [8086:8c22] (rev 04)
01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:11e2] (rev a1)
03:00.0 Ethernet controller [0200]: Qualcomm Atheros Killer E2200 Gigabit Ethernet Controller [1969:e091] (rev 13)
04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device [10ec:5227] (rev 01)
05:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8723AE PCIe Wireless Network Adapter [10ec:8723]
Comment 4 Paulo Zanoni 2013-11-21 13:41:58 UTC
01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:11e2] (rev a1)

What's this? Can you try to disconnect it?

We should probably reassign this to the nouveau devs.
Comment 5 Daniel Vetter 2013-11-25 10:11:05 UTC
Reassigned to nouveau, not really an intel issue.
Comment 6 Ilia Mirkin 2013-11-25 10:27:50 UTC
Looks like you have no outputs. I think this will lead to

	chan->vblank.nr_event = pdisp->vblank->index_nr;

failing in nv50_software_context_ctor, called from nouveau_accel_init since pdisp->vblank == NULL. I suspect that ->index_nr is at offset 0x130 which explains the CR2 that you see.

As a temporary workaround, booting with nouveau.noaccel=1 will avoid the crash (and also remove your ability to use the card).
Comment 7 Ilia Mirkin 2013-11-30 06:42:47 UTC
You can apply the equivalent of this patch to your tree (the paths are wrong, but it should be obvious how to fix them up): http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=2c2e48684dbe49f27e51ec0b2a4aa50df0f0f295
Comment 8 Ben Skeggs 2013-11-30 09:02:09 UTC
(In reply to comment #7)
> You can apply the equivalent of this patch to your tree (the paths are
> wrong, but it should be obvious how to fix them up):
> http://cgit.freedesktop.org/~darktama/nouveau/commit/
> ?id=2c2e48684dbe49f27e51ec0b2a4aa50df0f0f295

Or, you clone the tree, "cd drm; make && sudo make install" - and you get a module built against your current kernel.
Comment 9 Emil Velikov 2013-12-13 19:49:25 UTC
The fix is part of the 3.13 fixes full request, and should land shortly. Let us know if you're still having problems with the upcoming 3.13-rc4.
Comment 10 Guo Jinxian 2013-12-17 05:57:10 UTC
Checked on -testing (95d42e7d341861132ffec806328dd7fc85516995), this bug had fixed, thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.