92961 – Xorg freezes (only mouse and ssh are still working)

Bug 92961 - Xorg freezes (only mouse and ssh are still working)

Summary: Xorg freezes (only mouse and ssh are still working)

Status:	RESOLVED MOVED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/nouveau (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	high critical
Assignee:	Nouveau Project
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-11-15 10:41 UTC by ruben
Modified:	2019-12-04 09:06 UTC (History)
CC List:	4 users (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg output from boot onwards (75.69 KB, text/plain) 2015-11-15 10:41 UTC, ruben	no flags	Details
X log as part of the sstem log redirected by GDM (4.71 MB, text/plain) 2015-11-15 11:11 UTC, ruben	no flags	Details
Kernel log 1 (3.47 KB, text/plain) 2016-10-18 16:22 UTC, Marcelo Jimenez	no flags	Details
Kernel log 2 (6.89 KB, text/plain) 2016-10-18 16:23 UTC, Marcelo Jimenez	no flags	Details
Kernel log 3 (1.69 KB, text/plain) 2016-10-18 16:23 UTC, Marcelo Jimenez	no flags	Details
shutdown log when the system was freezed (360 bytes, text/plain) 2018-02-13 07:44 UTC, Tulli0	no flags	Details
reset variable ret for return EBUSY in gpfifogf100.c and gpfifogk104.c (695 bytes, patch) 2018-02-13 08:57 UTC, Tulli0	no flags	Details \| Splinter Review
Handle INTR 0x00800000 in gf100_fifo_intr (674 bytes, patch) 2018-12-02 23:12 UTC, Nic Soudée	no flags	Details \| Splinter Review
View All

Description ruben 2015-11-15 10:41:06 UTC

Created attachment 119677 [details]
dmesg output from boot onwards

The desktop freezes consistently after a few minutes using the nouveau driver.

Only the mouse can still be used. The system is still fully  operational and reachable via ssh.

Comment 1 ruben 2015-11-15 11:11:25 UTC

Created attachment 119678 [details]
X log as part of the sstem log redirected by GDM

Comment 2 Pierre Moreau 2015-11-15 11:34:55 UTC

I would start by blacklisting the NVIDIA driver, to prevent it from loading while Nouveau is being used as it may results in conflicts. (Just write a file named `whichever_name_you_want.conf` in `/etc/modprobe.d/`, and containing `blacklist nvidia`.)

Regarding the second attachment, could you rather please attach Xorg0.log as found in `~/.local/share/xorg/`, or `~/.local/xorg/` or `/var/log/`? (From a run that froze, and please attach the corresponding dmesg, so that both the dmesg and the Xorg.log are from the same run.)

(In reply to ruben from comment #0)
> The desktop freezes consistently after a few minutes using the nouveau
> driver.

Have you noticed anything that may cause the freeze, like starting a browser, running some OpenGL application, or even some fancy compositor animation?

Comment 3 Viorel-Cătălin Răpițeanu 2015-11-22 23:26:43 UTC

(In reply to ruben from comment #0)
> Created attachment 119677 [details]
> dmesg output from boot onwards
> 
> The desktop freezes consistently after a few minutes using the nouveau
> driver.
> 
> Only the mouse can still be used. The system is still fully  operational and
> reachable via ssh.

I have the same behaviour on a Latitude e6420 with NVIDIA Corporation GF119M [NVS 4200M] GPU.

> Have you noticed anything that may cause the freeze, like starting a browser, 
> running some OpenGL application, or even some fancy compositor animation?

No. The crashes happen at random intervals. Most of the time there isn't any intensive application opened when it happens. Regarding the compositor, in my case, the freeze happens even if there isn't any actively used.
The software versions I'm using are:
xf86-video-nouveau 1.0.11+31+g1ff13a9
mesa 11.0.6
xorg-server 1.18.0
plasma-workspace 5.4.95 (happened with a stable version as well 5.4.3)

If you connect via ssh, you can see on the dmesg can be seen the following error:
> nouveau E[   PFIFO][0000:01:00.0] INTR 0x00800000
most of the time preceaded by:
> nouveau W[   PFIFO][0000:01:00.0] INTR 0x01000000: 0x00000005

Also the full trace of this failure can be observed on the Xorg log.
I've attached all logs on the raported 'Bug 71659'.

Comment 4 Yoram 2015-11-23 10:46:09 UTC

I experience the same issue.
I have noticed that it only occurs if plasma is built with gles2 support (on gentoo, requiring the while QT+KDE stack to be built with gles2).

A very similar freeze occurs with enlightenment/wayland, so I assume it's a bug in nouveau.

Comment 5 Viorel-Cătălin Răpițeanu 2015-11-23 18:17:54 UTC

(In reply to Yoram from comment #4)
> I experience the same issue.
> I have noticed that it only occurs if plasma is built with gles2 support (on
> gentoo, requiring the while QT+KDE stack to be built with gles2).
> 
> A very similar freeze occurs with enlightenment/wayland, so I assume it's a
> bug in nouveau.

Thanks for the info!
To maintain my DE, for the time being, I've switched on KDE/Openbox. Things seem to work great for now.

Comment 6 ruben 2015-11-24 20:45:51 UTC

(In reply to Pierre Moreau from comment #2)
> I would start by blacklisting the NVIDIA driver, to prevent it from loading
> while Nouveau is being used as it may results in conflicts. (Just write a
> file named `whichever_name_you_want.conf` in `/etc/modprobe.d/`, and
> containing `blacklist nvidia`.)

The system was already like this, but it still crashes.

> Regarding the second attachment, could you rather please attach Xorg0.log as
> found in `~/.local/share/xorg/`, or `~/.local/xorg/` or `/var/log/`? (From a
> run that froze, and please attach the corresponding dmesg, so that both the
> dmesg and the Xorg.log are from the same run.)

I am seldom at this computer at the moment, but if it crashes again I will do that. Still the attachements before should provide the same information. Should they not?

> (In reply to ruben from comment #0)
> > The desktop freezes consistently after a few minutes using the nouveau
> > driver.
> 
> Have you noticed anything that may cause the freeze, like starting a
> browser, running some OpenGL application, or even some fancy compositor
> animation?

No, nothing special. May even happend when I am practically leaving the computer untouched. In case it helps, I am using GNOME 3.

Comment 7 Marcelo Jimenez 2016-10-18 16:21:26 UTC

Recently I had to switch to nouveau due to a problem with the official nvidia drivers, and I have been experiencing ramdom gui freezes. The mouse still moves on the screen, but I cannot interact with the gui.

The nvidia modules are NOT installed in the system.

I usually can connect via ssh and reboot the machine.

The freeze happens at random times, there is no particular sequence to produce it. I use KDE on opensuse 42.1.

lspci is:
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 730] (rev a1)

I will attach the pertinent kernel logs.

Comment 8 Marcelo Jimenez 2016-10-18 16:22:40 UTC

Created attachment 127379 [details]
Kernel log 1

Comment 9 Marcelo Jimenez 2016-10-18 16:23:09 UTC

Created attachment 127380 [details]
Kernel log 2

Comment 10 Marcelo Jimenez 2016-10-18 16:23:35 UTC

Created attachment 127381 [details]
Kernel log 3

Comment 11 Marcelo Jimenez 2016-10-18 16:24:36 UTC

Previously reported here: https://bugzilla.suse.com/show_bug.cgi?id=1004311

Comment 12 Ilia Mirkin 2016-10-18 16:33:03 UTC

(In reply to Marcelo Jimenez from comment #7)
> The freeze happens at random times, there is no particular sequence to
> produce it. I use KDE on opensuse 42.1.

A number of users have been experiencing issues with KDE across a variety of hardware with nouveau. My current advice is to either not use KDE or not use nouveau_dri.so. [It's most likely not KDE's fault, but this is the present reality.]

Separately, OpenSuSE included some patches of mine in their Mesa builds that address some but hardly all of the issues involved. I believe the approach to be fundamentally flawed and in need of redoing. (The latest may be that they have dropped those.)

Comment 13 Marcelo Jimenez 2016-10-18 16:51:21 UTC

Well, since the problem also happens on gnome 3 (a report before mine), would you have any pointer for me on how do I disable the use of nouveau_dri.so? Notice that using the proprietary nvidia driver is currently not an option, since it is currently bogus in my machine.

Comment 14 Ilia Mirkin 2016-10-18 16:56:23 UTC

(In reply to Marcelo Jimenez from comment #13)
> Well, since the problem also happens on gnome 3 (a report before mine),
> would you have any pointer for me on how do I disable the use of
> nouveau_dri.so? Notice that using the proprietary nvidia driver is currently
> not an option, since it is currently bogus in my machine.

"Random freezes" is not a single issue. Most likely the OP's issue is wholly unrelated to yours.

locate nouveau_dri.so, and remove it.

Comment 15 Tulli0 2018-02-13 07:44:05 UTC

Created attachment 137309 [details]
shutdown log when the system was freezed

Comment 16 Tulli0 2018-02-13 07:46:05 UTC

I have the same problem with Gnome 3 on Gentoo with nouveau driver.
The freezes is randomly and with web browser, mail client and so on.
When I shutdown the system for a reboot I can see the message that I attached
I search this file [gpfifogf100.c] on internet and I find this one patch:
https://patchwork.kernel.org/patch/9502079/

I restore the previous value of ret with:
"ret = -EBUSY;"
in the gpfifogf100.c and gpfifogk104.c
And the problem seam solved

This is the fourth day of test without a freeze.

Comment 17 Tulli0 2018-02-13 08:57:24 UTC

Created attachment 137310 [details] [review]
reset variable ret for return EBUSY in gpfifogf100.c and gpfifogk104.c

Patch for testing stability after freeze nouveau driver

Comment 18 Tulli0 2018-02-16 09:47:33 UTC

It freezes again :-(
I try to set the waiting timer from 2 to 3 seconds but the system freeze randomly 
file:
drivers/gpu/drm/nouveau/nvkm/engine/fifo/gpfifogf100.c
...
        if (nvkm_msec(device, 3000,
                if (nvkm_rd32(device, 0x002634) == chan->base.chid)
                        break;
...

Comment 19 Nic Soudée 2018-12-02 23:12:51 UTC

Created attachment 142692 [details] [review]
Handle INTR 0x00800000 in gf100_fifo_intr

Attached is a patch I applied to kernel 4.19.5 to desperately thwart my DELL E6420 from randomly getting its video busted (very similar symptons as Comment #3 of this ticket, and is why I'm posting here). I don't have any experience with such low-level programming but I just pretended I knew what I was doing and cut and pasted a condition for that INTR 0x00800000 error which pops up every time that catastrophic random event happens.

So far, my E6420 is working great despite receiving some of those INTRs, thanks to this patch. I am posting this in hopes it might be on the right track towards getting this fixed by someone who knows what they're doing...

Comment 20 Ilia Mirkin 2018-12-02 23:25:13 UTC

(In reply to Nic Soudée from comment #19)
> Created attachment 142692 [details] [review] [review]
> Handle INTR 0x00800000 in gf100_fifo_intr
> 
> Attached is a patch I applied to kernel 4.19.5 to desperately thwart my DELL
> E6420 from randomly getting its video busted (very similar symptons as
> Comment #3 of this ticket, and is why I'm posting here). I don't have any
> experience with such low-level programming but I just pretended I knew what
> I was doing and cut and pasted a condition for that INTR 0x00800000 error
> which pops up every time that catastrophic random event happens.
> 
> So far, my E6420 is working great despite receiving some of those INTRs,
> thanks to this patch. I am posting this in hopes it might be on the right
> track towards getting this fixed by someone who knows what they're doing...

That's a little surprising. The existing logic will mask out further interrupts by writing a 0 into the relevant bit of 2140 (which is the INTR_EN register, which controls which intr's get surfaced).

Your logic removes that, which means that you'll keep getting the unknown intr's, and also you add a read from 258c. That's unlikely to matter though.

However now if we *do* ever get an interrupt for that bit in 2100 after disabling the bit in 2140, then it'll be stuck forever. I suspect that the "& mask" should be removed at the beginning of that function [and thus the read from 2140].

Comment 21 Martin Peres 2019-12-04 09:06:45 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/235.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.