Bug 92892

Summary: [NVA8/NV98] KDE Plasma locks up: Nouveau reports error "resource sanity check" "unable to handle kernel paging request"
Product: Mesa Reporter: Volker Lukas <vlukas>
Component: Drivers/DRI/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Nouveau Project <nouveau>
Severity: normal    
Priority: medium CC: doktor.yak, tiwai
Version: 11.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=92962
https://bugs.kde.org/show_bug.cgi?id=358142
https://bugzilla.opensuse.org/show_bug.cgi?id=959732
https://bugzilla.redhat.com/show_bug.cgi?id=1303643
Whiteboard:
i915 platform: i915 features:
Attachments: Dmesg from system start until and including the second time a kernel backtrace appears
Output of hwinfo --gfx
Version information of some relevant packages
Recent weeks of package install history
dmesg 4.5.0-0.rc2.git0.1.fc24.x86_64 nouveau KDE5

Description Volker Lukas 2015-11-10 19:13:13 UTC
Created attachment 119547 [details]
Dmesg from system start until and including the second time a kernel backtrace appears

I was referred by my distributions bug tracking to report an error related to the Nouveau drivers here.

I use the "Tumbleweed" distribution by Opensuse. This is a kind of rolling release with package upgrades circa once a week. On 2015-11-08 I pulled that distributions latest snapshot, which installed among not many other packages Linux kernel version 4.3.0, which was an upgrade from 4.2.4.

Since then I experience grave issues when using the KDE desktop. Beginning on that Sunday (i.e. 2015-11-08) after some time of using the desktop I noticed that KDEs text editor ("Kwrite") would not start anymore when launched from the file manager. Initially I thought this was an communcation problem inside KDE, because I could restart the file manager and managed to open one text file. But then, attempting to launch a second instance of Kwrite via the file mananger failed again. I tried repeatedly and found that after very few attempts Kwrite could not be launched anymore. At that time I could still interact with other, running programs. But after some time the whole desktop locked up. Not even switchting to a text console via Ctrl + F1 worked. The system had to be rebooted.

I can now reproduce a whole desktop lockup by this simple procedure:

- Power on
- Login in KDM
- Pressing Alt + F2, then typing konsole in the mini command line
- Entering dmesg in Konsole window.
- Opening a second Konsole tab.
- In that new tab, typing kwrite. Kwrite is not launched successfully by that attempt.


To gather information, I installed kernel version 4.2.4 from the distributions package in parallel to 4.3.0.

When I boot 4.2.4, I can not reproduce the desktop lockups.

I have attached the complete dmesg output which stems from the procedure to reproduce above. As you can see, there are some suspicious kernel backtraces related to Nouveau. One of these backtraces is associated in time closely to attempting to launch Kwrite, i.e. after I type "dmesg" for the first time I only see one backtrace. Then, after entering kwrite I can request dmesg again and spot the second kernel backtrace.

With slightly older kernel versions I also get these kernel backtraces in the system log (journalctl), but I do NOT experience whole desktop lockups. With even older kernel versions, I do not get these types of kernel backtraces. 
These are the lines from when journactl indicates a similar backtrace for the first time: (The installed kernel must have been 4.2.3 as far as I can determine through inspecting the package install history logfile.)
----- Kernel 4.2.3: ------------------------------------------------------------
Okt 27 15:42:10 linux-5rjk kernel: resource sanity check: requesting [mem 0xddf6d000-0xde06cfff], which spans more than 0000:01:00.0 [mem 0xdc000000-0xddffffff 64bit pref]
Okt 27 15:42:10 linux-5rjk kernel: ------------[ cut here ]------------
Okt 27 15:42:10 linux-5rjk kernel: WARNING: CPU: 0 PID: 5113 at ../arch/x86/mm/ioremap.c:198 __ioremap_caller+0x2de/0x360()
Okt 27 15:42:10 linux-5rjk kernel: Info: mapping multiple BARs. Your kernel is fine.
Okt 27 15:42:10 linux-5rjk kernel: Modules linked in:
Okt 27 15:42:10 linux-5rjk kernel:  nf_log_ipv6 xt_pkttype nf_log_ipv4 nf_log_common xt_LOG xt_limit iscsi_ibft iscsi_boot_sysfs af_packet ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables snd_hda_codec_hdmi snd_hda_codec_analog snd_hda_codec_generic iTCO_wdt gpio_ich iTCO_vendor_support ppdev dm_mod coretemp kvm_intel kvm pcspkr i2c_i801 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep lpc_ich mfd_core snd_pcm asus_atk0110 8250_fintek parport_pc parport snd_timer nouveau snd mxm_wmi wmi video ttm drm_kms_helper drm i2c_algo_bit acpi_cpufreq button processor shpchp soundcore hid_generic usbhid
Okt 27 15:42:10 linux-5rjk kernel:  ata_generic serio_raw firewire_ohci firewire_core crc_itu_t atl1 mii pata_jmicron ehci_pci uhci_hcd ehci_hcd usbcore usb_common sg
Okt 27 15:42:10 linux-5rjk kernel: CPU: 0 PID: 5113 Comm: kwrite Not tainted 4.2.3-1-default #1
Okt 27 15:42:10 linux-5rjk kernel: Hardware name: System manufacturer System Product Name/P5B-E, BIOS 1002    01/30/2007
Okt 27 15:42:10 linux-5rjk kernel:  ffffffff81a20135 ffff880180b93758 ffffffff81661dad 0000000000000007
Okt 27 15:42:10 linux-5rjk kernel:  ffff880180b937a8 ffff880180b93798 ffffffff81068246 ffffc90006cfffff
Okt 27 15:42:10 linux-5rjk kernel:  0000000000100000 ffffc90006c00000 00000000ddf6d000 0000000000000000
Okt 27 15:42:10 linux-5rjk kernel: Call Trace:
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff81007a15>] try_stack_unwind+0x175/0x190
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff81006223>] dump_trace+0x93/0x3a0
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff81007a7f>] show_trace_log_lvl+0x4f/0x60
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff8100663c>] show_stack_log_lvl+0x10c/0x180
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff81007b15>] show_stack+0x25/0x50
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff81661dad>] dump_stack+0x4c/0x6e
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff81068246>] warn_slowpath_common+0x86/0xc0
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff810682c6>] warn_slowpath_fmt+0x46/0x50
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff8105425e>] __ioremap_caller+0x2de/0x360
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff810542f7>] ioremap_nocache+0x17/0x20
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa0234e72>] nvkm_barobj_ctor+0xc2/0xf0 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa02338d1>] nvkm_object_ctor+0x31/0xd0 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa0234ece>] nvkm_bar_alloc+0x2e/0x40 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa023092d>] nvkm_gpuobj_create_+0x26d/0x2a0 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa023099d>] _nvkm_gpuobj_ctor+0x3d/0x50 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa02338d1>] nvkm_object_ctor+0x31/0xd0 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa02309fc>] nvkm_gpuobj_new+0x4c/0x50 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa0274f41>] nvkm_vm_get+0x171/0x2c0 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa02c4e9e>] nouveau_bo_vma_add+0x2e/0x90 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa02d63c5>] nouveau_channel_prep+0x215/0x2f0 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa02d6511>] nouveau_channel_new+0x71/0x700 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa02d53da>] nouveau_abi16_ioctl_channel_alloc+0x12a/0x3f0 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa01493a5>] drm_ioctl+0x125/0x610 [drm]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffffa02bdab0>] nouveau_drm_ioctl+0x70/0xd0 [nouveau]
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff811f2bf5>] do_vfs_ioctl+0x285/0x460
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff811f2e49>] SyS_ioctl+0x79/0x90
Okt 27 15:42:10 linux-5rjk kernel:  [<ffffffff81667e32>] entry_SYSCALL_64_fastpath+0x16/0x75
Okt 27 15:42:10 linux-5rjk kernel: DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x75
Okt 27 15:42:10 linux-5rjk kernel: 
Okt 27 15:42:10 linux-5rjk kernel: Leftover inexact backtrace:
Okt 27 15:42:10 linux-5rjk kernel: ---[ end trace d43371eb12dab49d ]---
Okt 27 15:42:10 linux-5rjk kernel: nouveau E[kwrite[5113]] channel failed to initialise, -12
Okt 27 15:42:13 linux-5rjk kernel: SFW2-INext-DROP-DEFLT IN=enp3s0 OUT= MAC [... cut long line, the reporter]
Okt 27 15:42:16 linux-5rjk kernel: SFW2-INext-DROP-DEFLT IN=enp3s0 OUT= MAC [... cut long line, the reporter]
Okt 27 15:42:30 linux-5rjk kernel: resource sanity check: requesting [mem 0xddf6d000-0xde06cfff], which spans more than 0000:01:00.0 [mem 0xdc000000-0xddffffff 64bit pref]
Okt 27 15:42:30 linux-5rjk kernel: nouveau E[kwrite[5122]] channel failed to initialise, -12
--------------------------------------------------------------------------------


For your information I have attached files showing the package install history (only the most recent weeks), output of "hwinfo --gfx" and a bit of information about installed packages (I hope you can make something out of the RPM output, if not I am glad to supply any missing information). Of course I attached the dmesg output as well, as written above.

For reference I opened this report in Opensuses bug tracking: https://bugzilla.opensuse.org/show_bug.cgi?id=954473

For further information, for some time now (ca. since mid 2015) I also get similar Nouveau failure message to those that are attached here: https://bugs.freedesktop.org/show_bug.cgi?id=92504
But these do not usually provoke hard desktop lockups, and are only seen when I also use Firefox, which I do sparingly. So that is likely a separate problem, which has a weak relation to my recent troubles.
Comment 1 Volker Lukas 2015-11-10 19:13:43 UTC
Created attachment 119548 [details]
Output of hwinfo --gfx
Comment 2 Volker Lukas 2015-11-10 19:14:22 UTC
Created attachment 119549 [details]
Version information of some relevant packages
Comment 3 Volker Lukas 2015-11-10 19:15:28 UTC
Created attachment 119550 [details]
Recent weeks of package install history
Comment 4 Ilia Mirkin 2015-11-10 19:17:21 UTC
Nouveau underwent a significant rewrite for kernel 4.3. Any chance you could bisect the changes to drivers/gpu/drm/nouveau between v4.2 and v4.3?
Comment 5 Volker Lukas 2015-11-10 21:58:16 UTC
I will try to bisect between 4.2 and 4.3. I will likely not report back until the weekend. Thanks for answering so fast.
Comment 6 Andreas Nordal 2015-11-24 16:17:13 UTC
I believe I'm seeing the same bug:
* plasma5 hangs with kernel to 4.3, but not 4.2
* "resource sanity check" in /var/log/messages

I have an "NVIDIA Corporation G98 [Quadro NVS 295] (rev a1)" as seen by lspci (Dell workstation).

I have kernel-default-4.3.0-2.1 on Tumbleweed, but when I had kernel-default-4.3.0-1.1, I was also seeing "DRM: GPU lockup - switching to software fbcon" like in #92971.
Comment 7 Volker Lukas 2015-11-30 11:19:13 UTC
I used git bisect to find the first bad kernel revision. This is Gits "BISECT_LOG":

git bisect start
# good: [1c02865136fee1d10d434dc9e3616c8e39905e9b] Linux 4.2.6
git bisect good 1c02865136fee1d10d434dc9e3616c8e39905e9b
# bad: [6ff33f3902c3b1c5d0db6b1e2c70b6d76fba357f] Linux 4.3-rc1
git bisect bad 6ff33f3902c3b1c5d0db6b1e2c70b6d76fba357f
# good: [64291f7db5bd8150a74ad2036f1037e6a0428df2] Linux 4.2
git bisect good 64291f7db5bd8150a74ad2036f1037e6a0428df2
# good: [dd5cdb48edfd34401799056a9acf61078d773f90] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect good dd5cdb48edfd34401799056a9acf61078d773f90
# bad: [f377ea88b862bf7151be96d276f4cb740f8e1c41] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
git bisect bad f377ea88b862bf7151be96d276f4cb740f8e1c41
# good: [abebcdfb64f1b39eeeb14282d9cd4aad1ed86f8d] Merge tag 'sound-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good abebcdfb64f1b39eeeb14282d9cd4aad1ed86f8d
# good: [bef2c7bd578e91c9c10983e0c15c4501127b77ca] Merge tag 'drm/tegra/for-4.3-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next
git bisect good bef2c7bd578e91c9c10983e0c15c4501127b77ca
# good: [99336ed363f49f484b4d93600c4dfec1f2ebb84a] drm/nouveau/ltc: switch to device pri macros
git bisect good 99336ed363f49f484b4d93600c4dfec1f2ebb84a
# bad: [97070f23c60869830039b216ff88230f54ef7107] drm/nouveau/pm: convert to new-style nvkm_engine
git bisect bad 97070f23c60869830039b216ff88230f54ef7107
# good: [c813d8e048740ca82b88a9d3f639bbd8095b24ac] drm/nouveau/bin: punt client/device argument handling into a common helper
git bisect good c813d8e048740ca82b88a9d3f639bbd8095b24ac
# bad: [6157091177102638c7d94ffc159c0b157a1c9b56] drm/nouveau/sw: remove dependence on namedb/engctx lookup
git bisect bad 6157091177102638c7d94ffc159c0b157a1c9b56
# good: [168c2e213d3a9b605856d3676d9e93733c8b37d3] drm/nouveau/engine: implement support for new-style nvkm_engine
git bisect good 168c2e213d3a9b605856d3676d9e93733c8b37d3
# good: [358ce601ae5de59bf6f08f79455c5b3cb7d359d4] drm/nouveau/fifo: directly use instmem for runlists and polling areas
git bisect good 358ce601ae5de59bf6f08f79455c5b3cb7d359d4
# bad: [344c2d429dd86b1b0113177e18f15adb74e9d936] drm/nouveau/fb: remove dependence on namedb/engctx lookup
git bisect bad 344c2d429dd86b1b0113177e18f15adb74e9d936
# bad: [1d2a1e53865266a67fb569705eba3ec992682721] drm/nouveau/ramht: remove dependence on namedb
git bisect bad 1d2a1e53865266a67fb569705eba3ec992682721
# good: [f027f49166171c98d5945af12ac3ee9bc9f9bf4c] drm/nouveau/gpuobj: separate allocation from nvkm_object
git bisect good f027f49166171c98d5945af12ac3ee9bc9f9bf4c
# first bad commit: [1d2a1e53865266a67fb569705eba3ec992682721] drm/nouveau/ramht: remove dependence on namedb
Comment 8 doktor.yak 2015-12-03 20:52:43 UTC
I have a very similar setup.
- OpenSUSE Tumbleweed
- Dell Laptop (E6510)
- NVIDIA Corporation GT218M [NVS 3100M]
- upgraded kernel to 4.3.0


And exactly the same symptoms
(down to the same call backtrace).


Thus, I can help testing driver fixes if needed.

Also @Volker Lukas:
- Where did you get the older still functioning copy (4.2.4 ?)
I would like download and have until the 4.3.0 kernel gets fixed, but all the tumbleweed mirror seem to have deleted the older kernel RPMs and only have the latest one.
Comment 9 Volker Lukas 2015-12-04 10:47:20 UTC
Hi doktor.yak,

at the time I encountered this bug, the Opensuse Linux 4.2.4 RPM was still downloadable.

If you build linux-4.2.6.tar.xz from kernel.org via "make rpm" you should be able to get a working kernel package. You can copy the /boot/config-4.x-something to the kernel source directory to copy the build configuration (rename it to ".config").
Comment 10 doktor.yak 2015-12-04 11:05:14 UTC
Thanks for your answer.

Do you know if Suse did apply any patch on their version of the 4.2.4 kernel ?

Otherwise I'll follow your recommendation and compile a vanilla kernel. (with "make oldconfig"-ing /proc/config.gz)
Comment 11 Takashi Iwai 2015-12-04 16:12:57 UTC
(In reply to doktor.yak from comment #10)
> Thanks for your answer.
> 
> Do you know if Suse did apply any patch on their version of the 4.2.4 kernel
> ?

Nothing about nouveau.
 
> Otherwise I'll follow your recommendation and compile a vanilla kernel.
> (with "make oldconfig"-ing /proc/config.gz)

It's anyway better to compile by yourself for excluding any subtle differences.
Comment 12 Volker Lukas 2016-01-24 14:11:16 UTC
With current Opensuse snapshots this problem is gone apparently. One notable upgrade is that of Linux to 4.4.0, but other upgrades also happened to X-Server, Mesa, KDE, Qt, etc...
Comment 13 poma 2016-02-03 03:29:54 UTC
Created attachment 121479 [details]
dmesg 4.5.0-0.rc2.git0.1.fc24.x86_64 nouveau KDE5


SW:
kernel-modules-4.5.0-0.rc2.git0.1.fc24.x86_64
libdrm-2.4.66-1.fc24.x86_64
xorg-x11-server-Xorg-1.18.0-5.fc24.x86_64
xorg-x11-drv-nouveau-1.0.12-1.fc24.x86_64
mesa-dri-drivers-11.2.0-0.devel.8.24ea81a.fc24.x86_64
plasma-workspace-5.5.4-1.fc24.x86_64
qt5-qtdeclarative-5.6.0-0.7.beta.fc24.x86_64

HW:
NVIDIA G98
Comment 14 poma 2016-02-03 05:23:01 UTC
After upgrade to:
$ rpm --query --file /usr/lib64/libQt5Qml.so.5.6.0
qt5-qtdeclarative-5.6.0-0.8.beta.fc24.x86_64
KDE5 starts without hassle


Ref.
- Info: qt5-qtdeclarative-5.6.0-0.8.beta.fc24
  http://koji.fedoraproject.org/koji/buildinfo?buildID=715479
  "build with -fno-delete-null-pointer-checks to workaround gcc6-related runtime crashes (#1303643)"

- "qt5-qtdeclarative-5.6.0-0.7.beta.fc24 broken"
  https://bugzilla.redhat.com/show_bug.cgi?id=1303643
Comment 15 poma 2016-02-04 14:04:35 UTC
http://download.opensuse.org/tumbleweed/iso/
openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20160130-Media.iso
works OK

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.