Bug 29800 - drm detects GPU lockup with dual cards/Xorg hard locks even with single screen
Summary: drm detects GPU lockup with dual cards/Xorg hard locks even with single screen
Status: RESOLVED NOTOURBUG
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-25 08:21 UTC by Thomas Spear
Modified: 2010-09-22 01:32 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log (44.39 KB, text/plain)
2010-08-25 13:45 UTC, Thomas Spear
no flags Details
dmesg which shows the gpu lockup (52.80 KB, text/plain)
2010-08-25 13:46 UTC, Thomas Spear
no flags Details
Picture of the problem that occurs (946.20 KB, image/jpeg)
2010-08-25 13:48 UTC, Thomas Spear
no flags Details
egrep "drm|nouveau|vgaarb" /var/log/messages (6.26 KB, text/plain)
2010-08-25 13:58 UTC, Thomas Spear
no flags Details
New dmesg without mesa-dri-drivers-experimental (47.46 KB, text/plain)
2010-08-25 14:27 UTC, Thomas Spear
no flags Details
/var/log/messages (145.39 KB, text/plain)
2010-08-25 14:30 UTC, Thomas Spear
no flags Details
New Xorg.0.log without mesa-dri-drivers-experimental (44.43 KB, text/plain)
2010-08-25 14:52 UTC, Thomas Spear
no flags Details

Description Thomas Spear 2010-08-25 08:21:49 UTC
[tspear@tomcat ~]$ cat /etc/redhat-release 
Fedora release 13 (Goddard)
[tspear@tomcat ~]$ uname -r
2.6.33.6-147.2.4.fc13.x86_64
[tspear@tomcat ~]$ rpm -qa |grep nouveau
xorg-x11-drv-nouveau-0.0.16-7.20100423git13c1043.fc13.x86_64
[tspear@tomcat ~]$ X -version

X.Org X Server 1.8.2
Release Date: 2010-07-01
X Protocol Version 11, Revision 0
Build Operating System: x86-16 2.6.32-44.el6.x86_64 
Current Operating System: Linux tomcat.localdomain 2.6.33.6-147.2.4.fc13.x86_64 #1 SMP Fri Jul 23 17:14:44 UTC 2010 x86_64
Kernel command line: ro root=/dev/mapper/vg_tomcat-lv_root rd_LVM_LV=vg_tomcat/lv_root rd_LVM_LV=vg_tomcat/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet 3
Build Date: 03 August 2010  05:10:46AM
Build ID: xorg-x11-server 1.8.2-3.fc13 
Current version of pixman: 0.18.0
	Before reporting problems, check http://bodhi.fedoraproject.org/
	to make sure that you have the latest version.
[tspear@tomcat ~]$ lspci -nn
05:00.0 VGA compatible controller [0300]: nVidia Corporation G86 [GeForce 8400 GS] [10de:0422] (rev a1)
06:00.0 VGA compatible controller [0300]: nVidia Corporation G86 [GeForce 8400 GS] [10de:0422] (rev a1)

2xPCI cards, onboard intel chip is enabled but there is no video output to the monitor plugged into that port.

3 screen setup: Left goes to intel port, center goes to the primary card which I believe is PCI:5:0:0, right goes to second card which I believe is PCI:6:0:0

There is no xorg.conf file as Fedora makes use of default settings built into X.

During init, the first thing I see after selecting the OS is some messages from drm saying that it detected a lockup in GPU-0 and then again in GPU-1. Then there is no activity on the center screen (it is still getting a signal, but the screen just does not update). The right screen gets the fedora loading logo. If I hit escape, then the center screen shows the usual services starting while the right screen loses the progress indicator but keeps the blue background.

Afterward, init completes normally as long as I am going to runlevel 3. I can then login on the center screen, but the right screen continues to display the fedora load screen indefinitely.

If I go to runlevel 5 at boot, then I get a garbled center screen and the fedora logo indefinitely on the right screen.

If I startx from runlevel 3, then the center screen gets the black and white dots screen but no mouse cursor while the right screen again shows the fedora load screen indefinitely; the machine hard locks to the point that I have to pull the plug or hit the power button. After rebooting, there is nothing written to the Xorg.0.log.

With the binary nvidia driver, the machine also hard locks after disabling both displays and I can see in the Xorg.0.log that the nvidia driver attached to the first GPU has gone into a wait state.

This is a Dell Optiplex 780 with 2 PCI slots and 1 PCI-E x16 slot. I have also tried to get Single PCI-E and PCI cards to work together (both 8400GS but different manufacturers) to no avail.

I'm sure there is more info you need, so please ask. This is a work machine, but I am allowed to work on it to try to get 2 monitors working under X, with 2 physical cards, whatever way I have to.

For the record, Twinview works fine with a single dual-head card under nouveau, which is what I am using right now to post this. It works under nvidia but I am trying to stick with open source since the likelihood of getting it working quicker is with open source. :-)
Comment 1 Thomas Spear 2010-08-25 13:43:37 UTC
As discussed in IRC, I've done the following:

-Updated the kernel per the instructions at http://nouveau.freedesktop.org/wiki/InstallDRM
-Updated the libdrm and xf86-video-nouveau per the instructions at http://nouveau.freedesktop.org/wiki/InstallNouveau

Below, you will find attachments that give more info about the problem.
Comment 2 Thomas Spear 2010-08-25 13:45:46 UTC
Created attachment 38149 [details]
Xorg.0.log
Comment 3 Thomas Spear 2010-08-25 13:46:30 UTC
Created attachment 38150 [details]
dmesg which shows the gpu lockup
Comment 4 Thomas Spear 2010-08-25 13:48:01 UTC
Created attachment 38151 [details]
Picture of the problem that occurs

This happened when I booted into init 3, then ran startx as my user account. At this point, numlock is not responding, and I have to hard poweroff the machine, as ctrl+alt_bksp, ctrl+alt+f(x), and ctrl+alt+del all do not work
Comment 5 Thomas Spear 2010-08-25 13:58:44 UTC
Created attachment 38152 [details]
egrep "drm|nouveau|vgaarb" /var/log/messages

I've stripped out entries from this that were from before and after the boot with the problem. If the full content of the messages file from this specific boot is needed, let me know and I can upload that instead.
Comment 6 Thomas Spear 2010-08-25 14:27:34 UTC
Created attachment 38154 [details]
New dmesg without mesa-dri-drivers-experimental
Comment 7 Thomas Spear 2010-08-25 14:30:33 UTC
Created attachment 38155 [details]
/var/log/messages

cleared just before rebooting to install the problematic cards
Comment 8 Thomas Spear 2010-08-25 14:52:29 UTC
Created attachment 38156 [details]
New Xorg.0.log without mesa-dri-drivers-experimental
Comment 9 Thomas Spear 2010-08-30 11:57:10 UTC
After a coworker got his F13 x86 install working with 3 monitors (with the blob) by removing NetworkManager, I started looking into my problem some more, and first tried removing NetworkManager. That did not change anything.

So I googled for the NVIDIA(0) WAIT error that I had from the bottom of the X log from when I used the blob. Sure enough, first result is a redhat bug (closed as notabug). The error is apparently in the Intel IOMMU, and adding intel_iommu=off to the boot command line seems to fix the hard lockup and allows my x86_64 install of F13 to use all 3 screens on both cards with the blob.

I will check if this has also fixed nouveau sometime soon and report back if it has.
Comment 10 Thomas Spear 2010-09-22 01:32:58 UTC
I believe it is safe to assume that it will work with nouveau as well, though I will not have time to test any time in the foreseeable future. I am marking this as notourbug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.