Bug 98383 - X server is crashing/eats 100% cpu when turning on monitor
Summary: X server is crashing/eats 100% cpu when turning on monitor
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-22 14:43 UTC by Mariusz Białończyk
Modified: 2017-04-03 15:42 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.log for a crash (110.35 KB, text/x-log)
2016-10-22 14:43 UTC, Mariusz Białończyk
no flags Details
backtrace for 100% cpu (from remote machine) (119.62 KB, text/x-log)
2016-10-22 14:43 UTC, Mariusz Białończyk
no flags Details
backtrace for crash situation (from core) (7.54 KB, text/x-log)
2016-10-22 14:44 UTC, Mariusz Białończyk
no flags Details
Xorg.log for the 100% cpu (121.97 KB, text/x-log)
2016-10-22 14:45 UTC, Mariusz Białończyk
no flags Details
xorg.conf (3.63 KB, text/plain)
2016-10-22 14:45 UTC, Mariusz Białończyk
no flags Details

Description Mariusz Białończyk 2016-10-22 14:43:14 UTC
Created attachment 127463 [details]
xorg.log for a crash

Hi,
I have multi-head configuration (5 monitors). One of the monitor is a HDTV Samsung TV connected via HDMI to the intel skylake.
Everything is working fine until I try to turn that HDTV monitor ON.
When X server is started with the TV turned ON, it is also ok (until i turn it off and on again).
In other words: the problem triggers only during turning ON that monitor/tv.
Note: I don't suspect that this TV is failing, it was working for years connected to nvidia on similar hardware.
When xinerama is enabled - in the moment of turning it on the server is crashing. GDB backtrace is in the attachment (backtrace-crash.log). The xorg log for that part is Xorg-crash.log

It may be also relevant: that when I disable xinerama, then in the same moment (when I turn on the TV), instead of Xserver crash I've got some endless loop which is burning 100% of one of my core. The process which is doing it is the Xorg.
In the same time when I am starting that TV (and it starts to eat the cpu) I can see in the xorg.log that it is trying to re-probe all EDID from all connected monitors, and I've got a plenty of lines like:

[ 65543.926] (II) NOUVEAU(2): EDID vendor "FUS", prod id 1818
[ 65543.926] (II) NOUVEAU(2): Using EDID range info for horizontal sync
[ 65543.926] (II) NOUVEAU(2): Using EDID range info for vertical refresh
[ 65543.926] (II) NOUVEAU(2): Printing DDC gathered Modelines:

Lines for intel don't show up because I disabled the probing on intel with option "HotPlug" "false" in hope that it will not eat the cpu (no luck).
In the same time I can normally work - no artifacts on screen, all seems working fine, but Xorg eats 100% of my cpu (until xorg restart).
When I switch VT to console - it still doing this endless loop.

To obtain some additional info I connected to xorg in such state from remote machine and I attached to the xorg to obtain a backtrace. Also some stepping was recorded by me.
The full session is in backtrace-100_cpu.log and the corresponding log is Xorg-100cpu.log.
Comment 1 Mariusz Białończyk 2016-10-22 14:43:55 UTC
Created attachment 127464 [details]
backtrace for 100% cpu (from remote machine)
Comment 2 Mariusz Białończyk 2016-10-22 14:44:35 UTC
Created attachment 127465 [details]
backtrace for crash situation (from core)
Comment 3 Mariusz Białończyk 2016-10-22 14:45:21 UTC
Created attachment 127466 [details]
Xorg.log for the 100% cpu
Comment 4 Mariusz Białończyk 2016-10-22 14:45:35 UTC
Created attachment 127467 [details]
xorg.conf
Comment 5 Mariusz Białończyk 2016-10-22 14:46:41 UTC
Please also see the https://bugs.freedesktop.org/show_bug.cgi?id=90482 if it isn't related with my problem.
Comment 6 Mariusz Białończyk 2016-10-24 09:43:50 UTC
The 100% CPU usage is gone when I applied a dura patch from BUG #90482 (https://bugs.freedesktop.org/show_bug.cgi?id=90482)

So at least the patch cures the 100% cpu usage problem, but it doesn't help to the server crash with xinerama enabled.
Comment 7 Mariusz Białończyk 2017-03-29 08:30:27 UTC
Hello again,
Guys, recently I switched to the recent Xorg and nouveau from git.
The 100% cpu problem seems to be gone (at least so far), but the nouveau is still crashing the xserver.

I still has to apply the following patch:
diff --git a/src/drmmode_display.c b/src/drmmode_display.c
index dd9fa27..a468223 100644
--- a/src/drmmode_display.c
+++ b/src/drmmode_display.c
@@ -1542,7 +1542,7 @@ drmmode_handle_uevents(ScrnInfoPtr scrn)
        if (!dev)
                return;

-       RRGetInfo(xf86ScrnToScreen(scrn), TRUE);
+       //RRGetInfo(xf86ScrnToScreen(scrn), TRUE);
        udev_device_unref(dev);
 }
 #endif

Otherwise - when I turn on the TV on other Xorg instance (with intel driver), the xorg is crashing. Maybe nouveau is trying to do something wrong with the randr event on other xorg instance?

It is now over a five months from the time when I reported it. Is it really so hard to cooperate with me to get rid of this bug? I can try your patches, give you additional debug information, I have programming skills.
Comment 8 Ilia Mirkin 2017-03-29 15:06:13 UTC
(In reply to Mariusz Białończyk from comment #7)
> It is now over a five months from the time when I reported it. Is it really
> so hard to cooperate with me to get rid of this bug? I can try your patches,
> give you additional debug information, I have programming skills.

A few things to consider:

1. nouveau is largely volunteer-driven. Volunteers tend to do the things that interest them, not the things that interest non-contributors. (Which means that the end user may have to persuade that their problem is more worthwhile to work on, and stomping your foot on the ground isn't the best way to achieve that.)

2. You initially filed the bug against a component that no nouveau developer monitors

3. If you want to get help figuring this out yourself, you may want to try #nouveau and #dri-devel on freenode.
Comment 9 Mariusz Białończyk 2017-04-03 15:42:36 UTC
Fixed with:
https://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=e9418e434311336e905b70553a5ed740838d90ad

As for the 100% cpu endless loop - I cannot reproduce it anymore, so maybe it is now fixed with some other unknown commit, besides it has its own bug report: https://bugs.freedesktop.org/show_bug.cgi?id=90482, so don't mix it up.

Thank you, Ilia!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.