Bug 91125 - [NVE7] Nouveau read fault, locking up the gpu
Summary: [NVE7] Nouveau read fault, locking up the gpu
Status: RESOLVED DUPLICATE of bug 92504
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-27 12:15 UTC by exi+freedesktop
Modified: 2015-10-20 18:37 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
complete yournal output including dmesg/kernel messages (1018.87 KB, text/plain)
2015-06-29 14:43 UTC, exi+freedesktop
no flags Details
System Journal for 2015-08-17 (507.71 KB, text/plain)
2015-08-17 23:18 UTC, Jonathan Ryshpan
no flags Details
Desktop as it should be (61.77 KB, image/jpeg)
2015-08-17 23:20 UTC, Jonathan Ryshpan
no flags Details
Corrupt desktop (63.48 KB, image/jpeg)
2015-08-17 23:23 UTC, Jonathan Ryshpan
no flags Details

Description exi+freedesktop 2015-06-27 12:15:44 UTC
This just happened to me after a suspend/resume cycle.
I cannot reproduce the read fault, it happened right after opening a plasma context menu but i still wanted it documented:

Kernel:
Linux bugbox 4.0.5-1-ARCH #1 SMP PREEMPT Sat Jun 6 18:37:49 CEST 2015 x86_64 GNU/Linux

Xorg -version:
X.Org X Server 1.17.2
Release Date: 2015-06-16
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.0.4-2-ARCH x86_64 
Current Operating System: Linux bugbox 4.0.5-1-ARCH #1 SMP PREEMPT Sat Jun 6 18:37:49 CEST 2015 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=073feb51-816c-4a86-8de3-5e9ed3145571 rw resume=UUID=ba1b46a0-89f5-4ddd-adc0-5982d576f73c libahci.ignore_sss=1 swapaccount=1 quiet
Build Date: 16 June 2015  05:24:27PM
Current version of pixman: 0.32.6
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.

Mesa: mesa 10.6.0-1
Nouveau: xf86-video-nouveau 1.0.11-3

Dmesg error messages:
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] suspending console...
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] suspending display...
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] evicting buffers...
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] waiting for kernel channels to go idle...
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] suspending client object trees...
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] suspending kernel object tree...
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] re-enabling device...
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] resuming kernel object tree...
Jun 27 13:19:35 bugbox kernel: nouveau  [   VBIOS][0000:01:00.0] running init tables
Jun 27 13:19:35 bugbox kernel: nouveau  [    VOLT][0000:01:00.0] GPU voltage: 875000uv
Jun 27 13:19:35 bugbox kernel: nouveau  [  PTHERM][0000:01:00.0] fan management: automatic
Jun 27 13:19:35 bugbox kernel: nouveau  [     CLK][0000:01:00.0] --: core 405 MHz memory 648 MHz 
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] resuming client object trees...
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] resuming display...
Jun 27 13:19:35 bugbox kernel: nouveau  [     DRM] resuming console...
Jun 27 13:19:35 bugbox kernel: Modules linked in: xt_multiport ipt_REJECT nf_reject_ipv4 xt_comment hid_lenovo rfcomm ecb bnep hid_apple nouveau mxm_wmi ttm drm_kms_helper drm i2c_algo_bit veth tun fuse ctr ccm btrfs xor ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter snd_hda_codec_hdmi bridge stp llc raid6_pq snd_hda_codec_realtek joydev mousedev snd_hda_codec_generic psmouse serio_raw nls_iso8859_1 atkbd nls_cp437 iTCO_wdt hid_generic iTCO_vendor_support libps2 vfat fat intel_rapl iosf_mbi uvcvideo x86_pkg_temp_thermal arc4 videobuf2_vmalloc intel_powerclamp usbhid videobuf2_memops btusb coretemp videobuf2_core v4l2_common videodev hid kvm_intel bluetooth media rtl8192ce rtl_pci kvm rtl8192c_common crct10dif_pclmul
Jun 27 13:19:38 bugbox kernel: nouveau E[plasmashell[11934]] fail set_domain
Jun 27 13:19:38 bugbox kernel: nouveau E[plasmashell[11934]] validating bo list
Jun 27 13:19:38 bugbox kernel: nouveau E[plasmashell[11934]] validate: -22
Jun 27 13:19:38 bugbox kernel: nouveau E[plasmashell[11934]] fail set_domain
Jun 27 13:19:38 bugbox kernel: nouveau E[plasmashell[11934]] validating bo list
Jun 27 13:19:38 bugbox kernel: nouveau E[plasmashell[11934]] validate: -22
Jun 27 13:31:13 bugbox kernel: nouveau E[plasmashell[11934]] fail set_domain
Jun 27 13:31:13 bugbox kernel: nouveau E[plasmashell[11934]] validating bo list
Jun 27 13:31:13 bugbox kernel: nouveau E[plasmashell[11934]] validate: -22
Jun 27 13:31:13 bugbox kernel: nouveau E[plasmashell[11934]] fail set_domain
Jun 27 13:31:13 bugbox kernel: nouveau E[plasmashell[11934]] validating bo list
Jun 27 13:31:13 bugbox kernel: nouveau E[plasmashell[11934]] validate: -22
Jun 27 13:31:13 bugbox kernel: nouveau E[plasmashell[11934]] fail set_domain
Jun 27 13:31:13 bugbox kernel: nouveau E[plasmashell[11934]] validating bo list
Jun 27 13:31:13 bugbox kernel: nouveau E[plasmashell[11934]] validate: -22
Jun 27 13:31:13 bugbox kernel: nouveau E[   PFIFO][0000:01:00.0] read fault at 0x0000ae1000 [PTE] from GR/GPC0/T1_0 on channel 0x007f74b000 [plasmashell[11934]]
Jun 27 13:31:13 bugbox kernel: nouveau E[   PFIFO][0000:01:00.0] PGR engine fault on channel 8, recovering...
Jun 27 13:31:13 bugbox kernel: nouveau E[     PGR][0000:01:00.0] TRAP ch 8 [0x007f74b000 plasmashell[11934]]
Jun 27 13:31:13 bugbox kernel: nouveau E[     PGR][0000:01:00.0] GPC0/TPC0/TEX: 0x80000049
Jun 27 13:32:57 bugbox kernel: nouveau E[plasmashell[11934]] failed to idle channel 0xcccc0000 [plasmashell[11934]]
Jun 27 13:33:12 bugbox kernel: nouveau E[plasmashell[11934]] failed to idle channel 0xcccc0000 [plasmashell[11934]]



The "validate bo list" errors happen on every suspend/resume, so i can reproduce them.
The read fault and "failed to idle channel" error happened the first time and froze my screen.
Comment 1 exi+freedesktop 2015-06-29 14:43:13 UTC
Created attachment 116800 [details]
complete yournal output including dmesg/kernel messages

I just got this bug again freezing up my screen.
Just before restarting the computer via ssh I saw a "gpu lockup switching to..." message on the screen.
Comment 2 Jonathan Ryshpan 2015-08-17 23:11:34 UTC
I hit this bug frequently, usually after some minor user action like closing a window.  When it happens the screen freezes except for the cursor; when I look at a console via ctrl/f2 and wait for a while (about 2 minutes) I see messages like:
[20235.667444] nouveau E[plasmashell][16831] fail set_domain
[20235.667782] nouveau E[plasmashell][16831] validating bo list
[20235.668235] nouveau E[plasmachell][16831] valicate: -22

As soon as these messages appear, the desktop becomes mostly usable, but often strange; the icons become weird and some system messages are corrupted.  In a following comment (after I get the desktop back), I will attach a snapshot of the weird desktop and today's journal file.

The system is difficult to use with this bug.

System is
  Amd-64 4-processor hardware
  Nvidia 9500GT Graphics Card
  Fedora-22 with all updates
  KDE Frameworks 5.12.0

Xorg -version
X.Org X Server 1.17.2
Release Date: 2015-06-16
X Protocol Version 11, Revision 0
Build Operating System:  4.0.4-202.fc21.x86_64 
Current Operating System: Linux amito 4.1.4-200.fc22.x86_64 #1 SMP Tue Aug 4 03:22:33 UTC 2015 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-4.1.4-200.fc22.x86_64 root=/dev/mapper/fedora00-root ro rd.lvm.lv=fedora00/swap rd.lvm.lv=fedora00/root rhgb quiet LANG=en_US.UTF-8
Build Date: 15 July 2015  08:16:41AM
Build ID: xorg-x11-server 1.17.2-2.fc22 
Current version of pixman: 0.32.6
Comment 3 Jonathan Ryshpan 2015-08-17 23:18:19 UTC
Created attachment 117741 [details]
System Journal for 2015-08-17
Comment 4 Jonathan Ryshpan 2015-08-17 23:20:36 UTC
Created attachment 117742 [details]
Desktop as it should be

This is a part of my desktop when displayed properly
Comment 5 Jonathan Ryshpan 2015-08-17 23:23:25 UTC
Created attachment 117743 [details]
Corrupt desktop

This is the same part of my desktop when displayed when the graphics system is in its "weird" state"
Comment 6 David March 2015-08-18 15:59:42 UTC
Probable duplicate bug (though this current bug was filed earlier) with more information:

https://bugs.freedesktop.org/show_bug.cgi?id=91598

Apparently per above "fail set_domain" means the it is running out of vram.
Comment 7 Ilia Mirkin 2015-10-19 23:26:26 UTC
I may have made some headway on this issue in bug 92504 (comment 20). Please see if the patch in that comment helps.
Comment 8 Ilia Mirkin 2015-10-20 18:37:40 UTC

*** This bug has been marked as a duplicate of bug 92504 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.