Bug 91738 - [NV117] NULL deref in nvkm_i2c_try_acquire_pad, kernel 4.1
Summary: [NV117] NULL deref in nvkm_i2c_try_acquire_pad, kernel 4.1
Status: REOPENED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-23 19:00 UTC by ryanpcmcquen
Modified: 2015-11-21 00:24 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg_before_startx (69.58 KB, text/plain)
2015-08-23 19:00 UTC, ryanpcmcquen
no flags Details
Xorg log (8.88 KB, text/plain)
2015-08-23 19:00 UTC, ryanpcmcquen
no flags Details
dmesg_after_startx (70.03 KB, text/plain)
2015-08-23 19:01 UTC, ryanpcmcquen
no flags Details
/sys/kernel/debug/dri/0/vbios.rom (68.82 KB, application/octet-stream)
2015-08-25 04:38 UTC, ryanpcmcquen
no flags Details
dmesg on linux 4.1.8 (252.12 KB, text/plain)
2015-09-22 20:34 UTC, ryanpcmcquen
no flags Details
dmesg for linux 4.3-rc7 (130.85 KB, text/plain)
2015-10-25 17:43 UTC, ryanpcmcquen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description ryanpcmcquen 2015-08-23 19:00:34 UTC
Created attachment 117881 [details]
dmesg_before_startx

Host machine is Slackware64-current.

- xf86-video-nouveau 1.0.11
- xorg-server 1.17.2
- Linux 4.1.6

X is unable to start from what appears to be a block by nouveau. Blacklisting nouveau in /etc/modprobe.d/nouveau.conf like so:

    blacklist nouveau

results in a kernel panic. See attached dmesg and Xorg.0.log. Forgive me if I am reporting this bug in the wrong place.

Thank you.


dmesg:
http://sprunge.us/fNHi

Xorg.0.log
http://sprunge.us/XhRO

dmesg after startx:
http://sprunge.us/gEdV
Comment 1 ryanpcmcquen 2015-08-23 19:00:53 UTC
Created attachment 117882 [details]
Xorg log
Comment 2 ryanpcmcquen 2015-08-23 19:01:12 UTC
Created attachment 117883 [details]
dmesg_after_startx
Comment 3 Ilia Mirkin 2015-08-23 19:08:38 UTC
What is this -- xcmddc ? I suspect it's not helping.

But actually your problem is due to xf86-video-intel built against a recent libdrm version. You either need a patch, or to downgrade libdrm below 2.4.60. The intel ddx patch is available at http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=7fe2b2948652443ff43d907855bd7a051d54d309
Comment 4 Ilia Mirkin 2015-08-23 21:20:43 UTC
By the way, for the xcmddc issue, it may be helpful if you could provide your vbios (available at /sys/kernel/debug/dri/0/vbios.rom).

But Xorg is crashing most likely because of intel, which you need to rebuild against an older libdrm (or rebuild with the patch I linked to).
Comment 5 ryanpcmcquen 2015-08-23 22:46:21 UTC
Thank you Ilia! I will try that patch.
Comment 6 ryanpcmcquen 2015-08-25 02:49:10 UTC
Upgrading xf86-video-intel did the trick! Thank you Ilia. Although on my first reboot X would not start. It was a hard lockup (I was unable to get to another tty).

After forcing a reboot with the power button, I was able to start X on the next boot. I don't know if there is a relevant bug somewhere, but I will provide the necessary logs in case someone can make sense of them.

Thank you!


dmesg:
http://sprunge.us/ZSeG

Xorg.0.log.old:
http://sprunge.us/aGQM

Xorg.0.log:
http://sprunge.us/GEZK
Comment 7 Ilia Mirkin 2015-08-25 02:53:41 UTC
Looks like you still have the same failure from xcmddc. IMO it's worth investigating/fixing. Will need your vbios though, as I mentioned before.

[    8.700223] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[    8.700262] IP: [<ffffffffc0a3ca10>] nvkm_i2c_try_acquire_pad+0x40/0x80 [nouveau]
[    8.700265] PGD 2eb8067 PUD 2f1c067 PMD 0 
[    8.700269] Oops: 0000 [#1] SMP 
[    8.700313] Modules linked in: snd_hda_codec_hdmi snd_hda_intel snd_hda_controller snd_hda_codec snd_hda_core snd_hwdep joydev snd_pcm snd_timer hid_generic usbhid hid snd x86_pkg_temp_thermal intel_powerclamp coretemp intel_rapl btusb btbcm iwlmvm mac80211 btintel bluetooth iosf_mbi i2c_dev soundcore iwlwifi cfg80211 kvm_intel i915 r8169 rtsx_pci_ms nouveau kvm ttm drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel intel_gtt thermal processor video psmouse agpgart rtsx_pci_sdmmc rfkill mmc_core memstick mei_me mei lpc_ich rtsx_pci thermal_sys hwmon i2c_algo_bit xhci_pci xhci_hcd mxm_wmi ehci_pci wmi mii microcode i2c_i801 ehci_hcd evdev tpm_infineon serio_raw tpm_tis tpm battery button ac i2c_core loop
[    8.700321] CPU: 7 PID: 406 Comm: xcmddc Tainted: G          I     4.1.6 #2
[    8.700322] Hardware name: Notebook                         W230SS                 /W230SS                 , BIOS 4.6.5 05/13/2014
[    8.700324] task: ffff88041bec4380 ti: ffff8800c613c000 task.ti: ffff8800c613c000
[    8.700366] RIP: 0010:[<ffffffffc0a3ca10>]  [<ffffffffc0a3ca10>] nvkm_i2c_try_acquire_pad+0x40/0x80 [nouveau]
[    8.700369] RSP: 0018:ffff8800c613fc38  EFLAGS: 00010286
[    8.700370] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000010000000
[    8.700370] RDX: ffff88041817ec80 RSI: 0000000000000005 RDI: ffff88041c26d000
[    8.700371] RBP: ffff8800c613fc38 R08: 0000000000018de0 R09: ffff88041d803e00
[    8.700372] R10: ffff88041d803e00 R11: 0000000000000246 R12: ffff88041c26d000
[    8.700372] R13: 0000000000000000 R14: ffff880002ffdd00 R15: ffff88041c26d000
[    8.700373] FS:  00007fa6d1bab780(0000) GS:ffff88042fbc0000(0000) knlGS:0000000000000000
[    8.700374] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.700375] CR2: 0000000000000008 CR3: 000000041bd0d000 CR4: 00000000001406e0
[    8.700375] Stack:
[    8.700378]  ffff8800c613fca8 ffffffffc0a3cb1d ffff8800c613fc68 ffff8800c613fe08
[    8.700379]  0000000000008002 ffff8800c613fda8 ffff8800c613fe08 ffffffff811d3e74
[    8.700380]  ffff8800c613fcc8 ffffffff810b7779 ffff88041c26d000 ffff88041c26d398
[    8.700381] Call Trace:
[    8.700412]  [<ffffffffc0a3cb1d>] nvkm_i2c_acquire_pad+0xcd/0x150 [nouveau]
[    8.700422]  [<ffffffff811d3e74>] ? mntput+0x24/0x40
[    8.700433]  [<ffffffff810b7779>] ? update_curr+0xd9/0x160
[    8.700448]  [<ffffffffc0a3c930>] nvkm_i2c_acquire+0x40/0x60 [nouveau]
[    8.700462]  [<ffffffffc0a3e863>] aux_xfer+0x53/0x160 [nouveau]
[    8.700480]  [<ffffffffc000abf5>] __i2c_transfer+0x245/0x440 [i2c_core]
[    8.700484]  [<ffffffffc000ae44>] i2c_transfer+0x54/0x90 [i2c_core]
[    8.700487]  [<ffffffffc000aebf>] i2c_master_send+0x3f/0x50 [i2c_core]
[    8.700490]  [<ffffffffc001e460>] i2cdev_write+0x50/0x70 [i2c_dev]
[    8.700509]  [<ffffffff811b2ad8>] __vfs_write+0x28/0xf0
[    8.700518]  [<ffffffff81cc7982>] ? do_nanosleep+0x82/0x110
[    8.700534]  [<ffffffff815e2733>] ? security_file_permission+0x23/0xa0
[    8.700536]  [<ffffffff811b2f73>] ? rw_verify_area+0x53/0x100
[    8.700538]  [<ffffffff811b3209>] vfs_write+0xa9/0x1b0
[    8.700545]  [<ffffffff810e4180>] ? hrtimer_get_res+0x50/0x50
[    8.700547]  [<ffffffff811b4016>] SyS_write+0x46/0xb0
[    8.700549]  [<ffffffff81cc869b>] system_call_fastpath+0x16/0x6e
[    8.700567] Code: 42 08 48 8b 08 8b 09 81 e1 00 00 00 10 74 ec b8 01 00 00 00 f0 0f c1 42 1c 85 c0 74 36 48 8b 42 28 eb 11 0f 1f 84 00 00 00 00 00 <48> 8b 40 08 48 85 c0 74 0f 48 39 c7 75 f2 31 c0 5d c3 66 0f 1f 
[    8.700618] RIP  [<ffffffffc0a3ca10>] nvkm_i2c_try_acquire_pad+0x40/0x80 [nouveau]
[    8.700622]  RSP <ffff8800c613fc38>
[    8.700622] CR2: 0000000000000008
Comment 8 ryanpcmcquen 2015-08-25 03:03:27 UTC
Hi Ilia, I appreciate your help. This directory is empty: /sys/kernel/debug/

Is there anywhere else I can look?
Comment 9 Ilia Mirkin 2015-08-25 03:08:43 UTC
Make sure that debugfs is mounted. e.g.

mount -t debugfs debugfs /sys/kernel/debug
Comment 10 ryanpcmcquen 2015-08-25 03:22:10 UTC
Ah, I did not know that! Learning is fun.  :-)

As requested, vbios.rom:

http://sprunge.us/fLiF

Also the system is still experiencing a hard lock up after a few minutes, so this is definitely worth investigating.
Comment 11 Ilia Mirkin 2015-08-25 03:25:43 UTC
Please attach files here.
Comment 12 ryanpcmcquen 2015-08-25 03:31:43 UTC
Sorry, trying to do this from my phone, since the computer will not stay on for more than a few minutes.
Comment 13 Ilia Mirkin 2015-08-25 03:33:51 UTC
Add nouveau.modeset=0 to your kernel cmdline, that should prevent nouveau from loading. I believe that xcmddc is killing of CPU's one-by-one as nouveau is mishandling something it's asking for.
Comment 14 ryanpcmcquen 2015-08-25 04:38:28 UTC
Created attachment 117904 [details]
/sys/kernel/debug/dri/0/vbios.rom
Comment 15 ryanpcmcquen 2015-08-25 04:38:46 UTC
Thank you, I was able to get it attached.
Comment 16 ryanpcmcquen 2015-09-19 01:26:13 UTC
Ilia, should I change the status of this bug?
Comment 17 ryanpcmcquen 2015-09-22 20:34:56 UTC
Created attachment 118404 [details]
dmesg on linux 4.1.8

Just so you know, I am still getting a hard system freeze on Linux 4.1.8.

http://sprunge.us/OQUT
Comment 18 Ben Skeggs 2015-09-22 20:45:10 UTC
(In reply to ryanpcmcquen from comment #17)
> Created attachment 118404 [details]
> dmesg on linux 4.1.8
> 
> Just so you know, I am still getting a hard system freeze on Linux 4.1.8.
> 
> http://sprunge.us/OQUT

Are you able to try with a 4.3-rc kernel? This code got reworked to be more sane and hopefully fixes this issue as a side-effect.
Comment 19 ryanpcmcquen 2015-09-22 21:42:59 UTC
(In reply to Ben Skeggs from comment #18)
> Are you able to try with a 4.3-rc kernel? This code got reworked to be more
> sane and hopefully fixes this issue as a side-effect.

Thanks for the reply Ben. X does not work for me on Linux 4.2.1 (it will not start). I just tried compiling Linux 4.3-rc2, but it will not compile, I will try again when rc3 comes out, hopefully that will fix it.
Comment 20 ryanpcmcquen 2015-09-28 02:26:17 UTC
Linux 4.3-rc3 will not compile either, here is the error:

  GEN     arch/x86/lib/inat-tables.c
  CC      arch/x86/lib/inat.o
  CC      arch/x86/lib/insn.o
  AS      arch/x86/lib/memcpy_64.o
  AS      arch/x86/lib/memmove_64.o
  AS      arch/x86/lib/memset_64.o
  CC      arch/x86/lib/misc.o
  AS      arch/x86/lib/putuser.o
  AS      arch/x86/lib/rwsem.o
  CC      arch/x86/lib/usercopy.o
  CC      arch/x86/lib/usercopy_64.o
  AR      arch/x86/lib/lib.a
  LINK    vmlinux
  LD      vmlinux.o
  MODPOST vmlinux.o
  GEN     .version
  CHK     include/generated/compile.h
  UPD     include/generated/compile.h
  CC      init/version.o
  LD      init/built-in.o
arch/x86/built-in.o: In function `hv_machine_crash_shutdown':
mshyperv.c:(.text+0x31dcf): undefined reference to `native_machine_crash_shutdown'
make: *** [vmlinux] Error 1

http://sprunge.us/dPJQ
Comment 21 ryanpcmcquen 2015-09-28 02:26:35 UTC
Is there a patch I can try against nouveau?
Comment 22 ryanpcmcquen 2015-10-25 17:43:26 UTC
Created attachment 119187 [details]
dmesg for linux 4.3-rc7

I haven't had a freeze yet on Linux 4.3-rc7.

Do you notice anything wonky in dmesg?

http://sprunge.us/YEXF
Comment 23 ryanpcmcquen 2015-11-20 18:32:03 UTC
This appears to be fixed in the Nov 19 version of nouveau (git commit 6e6d8ac).
Comment 24 ryanpcmcquen 2015-11-21 00:23:56 UTC
I spoke too soon, the issue seems absent with the latest nouveau driver on Linux 4.2.x+, but still exists on 4.1.x. I now have Xorg 1.18.0 if that makes any difference.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.