Bug 78462 - [gm45 vga] display is black or 1 solid color (white, etc), xrandr reset fixes temporarily
Summary: [gm45 vga] display is black or 1 solid color (white, etc), xrandr reset fixes...
Status: CLOSED INVALID
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-08 22:56 UTC by Eric Johnson
Modified: 2017-07-24 22:54 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Log_Conf_Dump_files_for_1_color_screen_on_3_PCs (212.03 KB, text/plain)
2014-05-08 22:56 UTC, Eric Johnson
no flags Details

Description Eric Johnson 2014-05-08 22:56:41 UTC
Created attachment 98716 [details]
Log_Conf_Dump_files_for_1_color_screen_on_3_PCs

We're having a problem with the i915 driver on Opensuse 13.1 on with Lenovo m58 (type 7360) PC's in our company's kiosks.

Here is the issue:
Randomly, our touch screen kiosk display will change to a solid color. It looks like it takes what we think is the upper left pixel and displays that color over
the entire screen. We can start a VNC into the kiosk and
still see the entire screen just fine, but our customer standing in front
of the kiosk can only see one color. If we reboot the PC everything comes
back as it should. However, if we just restart the X server it does not
fix the problem.  We are able to connect a seperate external VGA monitor and see the same output as the touchscreen, so it doesn't seem to be the touch screen hardware.  
Also, up until recently, we had the same 67 kiosks using this touch screen running Windows XP for 3+ years without a single problem (due to XP's end of life we're moving everything to Linux).  The same display running with our newer PC (Lenovo M71e) hasn't exhibited the issue.  A couple of data points we've gathered:

1.  When we keep power to the display but power cycle the PC it DOES fix the problem.
2.  When we keep power to the display and just restart X it DOES NOT fix the issue.

This occurs randomly, but we're seeing it 2x per day with 67 kiosks
running the same setup. Interestingly, our kiosks running on a
later model Lenovo M71e do not experience this behavior, and I believe
that model has the Intel integrated HD Graphics 2000.

The folowing commands work to get things working again: 
"xrandr --output VGA1 --off", then "xrandr --output VGA1 --mode 1024x768 --rate 60".

-- chipset: i915 kernel driver, 00:02.1 Display controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)  (for all three PC's)
-- system architecture: x86_64
-- xf86-video-intel/xserver/mesa/libdrm version:
X.Org X Server 1.14.3.901 (1.14.4 RC 1)
 
On k2247:
libdrm_intel1-2.4.46-3.2.2.x86_64
xf86-video-intel-2.99.905-1.1.x86_64

On k1417:
libdrm_intel1-2.4.46-3.2.2.x86_64
xf86-video-intel-2.99.906-12.1.x86_64

On k1558:
libdrm_intel1-2.4.46-3.2.2.x86_64
xf86-video-intel-2.99.906-12.1.x86_64
 

-- kernel version: 3.11.6-4-default on k2247, 3.14.1-1.geafcebd-default on k1417, k1558
-- Linux distribution: OpenSUSE 13.1
-- Machine or mobo model: Lenovo M58, type 7360, model CV7
-- Display connector: VGA

No error state for /sys/kernel/debug/dri/0/i915_error_state, or for /sys/kernel/debug/dri/64/i915_error_state on any of the 3 PC's.

The different X conf files for each PC together make up the xorg.conf for that PC. 
The attachment has all the main files requested for each PC.  Kiosk 1558 does not have the X conf, and xrandr files.
Comment 1 Chris Wilson 2014-05-09 06:00:25 UTC
That the screen spontaneously dies would suggest a FIFO or SR underrun. I don't think the great WM updates touched g4x per-se, but it is always worth checking drm-intel-next in case they did magically fix something. You should also disable output polling (use drm_kms_helper.poll=0). Also running with drm.debug=6 and note the time when the screen dies so that we can see if there is any event about that time.
Comment 2 Eric Johnson 2014-05-23 16:19:12 UTC
We did have a PC experience the issue with the drm_kms_helper.poll=0 and drm.debug=6 flags set.  But the PC was rebooted and there was NO CRASH file
at /sys/class/drm/card0/error.  In fact no /sys directory at all.  

Do we need to do something to enable crash dumps?  Thanks!

Here is the log output we did have:

[drm] GPU crash dump saved to /sys/class/drm/card0/error
[drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[drm:i965_irq_handler], hotplug event received, stat 0x00000300

[drm:i965_irq_handler], hotplug event received, stat 0x00000300
Comment 3 Eric Johnson 2014-05-23 16:39:43 UTC
Correcting my mistake in the last post:
The directory structure is present.  This is it's output.

$ sudo cat /sys/class/drm/card0/error
no error state collected
Comment 4 Chris Wilson 2014-09-06 11:31:10 UTC
Ok, it is important to capture the error state before the next reboot (or else the contents are lost).
Comment 5 Jesse Barnes 2014-12-04 21:50:31 UTC
Yeah this does sound like a FIFO issue somewhere.  And the display tends to get really stuck when that happens, which matches your "restart X" not working experience.

Can you try a current kernel from drm-intel-nightly and see if the issue happens?

Can you force it to happen by putting a load on the system, especially a memory intensive one like lots of drawing and CPU activity in the background (some kind of number crunching maybe)?
Comment 6 Jesse Barnes 2015-03-03 20:08:45 UTC
I guess Eric has given up on us since we took too long. :(


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.