Bug 80008 - Random loss of video output, laptop connected via docking station to two external (DVI, VGA) monitors rotated in portrait mode
Summary: Random loss of video output, laptop connected via docking station to two exte...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Ville Syrjala
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-13 23:41 UTC by ilakast
Modified: 2017-07-24 22:53 UTC (History)
3 users (show)

See Also:
i915 platform: I965GM
i915 features: GPU hang


Attachments
Output of /sys/class/drm/card0/error (928.18 KB, text/plain)
2014-06-16 20:23 UTC, ilakast
no flags Details
latest crash when opening a pdf file (evince) (949.95 KB, text/plain)
2014-06-29 16:05 UTC, ilakast
no flags Details

Description ilakast 2014-06-13 23:41:49 UTC
When video output is lost, Fn+F7 does not help restoring video into the internal laptop LCD either. Only switching off completely and rebooting is the solution at the moment.

Jun 13 15:32:23 t61 kernel: [ 1118.433678] perf samples too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Jun 13 16:10:44 t61 kernel: [ 3419.215660] perf samples too long (5024 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
Jun 13 23:33:15 t61 kernel: [29969.988053] [drm] stuck on render ring
Jun 13 23:33:15 t61 kernel: [29969.988059] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Jun 13 23:33:15 t61 kernel: [29969.988061] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jun 13 23:33:15 t61 kernel: [29969.988063] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jun 13 23:33:15 t61 kernel: [29969.988064] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jun 13 23:33:15 t61 kernel: [29969.988066] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Jun 13 23:33:15 t61 kernel: [29969.989045] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xc464000 ctx 0) at 0xc464794
Jun 13 23:33:15 t61 kernel: [29970.060089] [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit banging on pin 2
Jun 13 23:33:15 t61 kernel: [29970.496078] [drm:i915_reset] *ERROR* Failed to reset chip.
Jun 13 23:33:42 t61 kernel: [29997.054531] ------------[ cut here ]------------
Jun 13 23:33:42 t61 kernel: [29997.054573] WARNING: CPU: 0 PID: 326 at /build/buildd/linux-3.13.0/drivers/gpu/drm/i915/intel_display.c:922 ass$
Jun 13 23:33:42 t61 kernel: [29997.054576] PLL state assertion failure (expected on, current off)
Jun 13 23:33:42 t61 kernel: [29997.054578] Modules linked in: bnep rfcomm bluetooth snd_hda_codec_analog hid_generic usbhid hid coretemp kvm_i$
Jun 13 23:33:42 t61 kernel: [29997.054624] CPU: 0 PID: 326 Comm: kworker/0:2 Not tainted 3.13.0-29-generic #53-Ubuntu
Jun 13 23:33:42 t61 kernel: [29997.054626] Hardware name: LENOVO 8896W8T/8896W8T, BIOS 7LETC9WW (2.29 ) 03/18/2011
Jun 13 23:33:42 t61 kernel: [29997.054633] Workqueue: kacpi_notify acpi_os_execute_deferred
Jun 13 23:33:42 t61 kernel: [29997.054635]  00000000 00000000 f2779cb4 c164f613 f2779cf4 f2779ce4 c10567ee f8db2210
Jun 13 23:33:42 t61 kernel: [29997.054643]  f2779d10 00000146 f8db0b64 0000039a f8d5ed03 f8d5ed03 00000001 00000001
Jun 13 23:33:42 t61 kernel: [29997.054649]  00000000 f2779cfc c1056843 00000009 f2779cf4 f8db2210 f2779d10 f2779d20
Jun 13 23:33:42 t61 kernel: [29997.054656] Call Trace:
Jun 13 23:33:42 t61 kernel: [29997.054664]  [<c164f613>] dump_stack+0x41/0x52
Jun 13 23:33:42 t61 kernel: [29997.054670]  [<c10567ee>] warn_slowpath_common+0x7e/0xa0
Jun 13 23:33:42 t61 kernel: [29997.054693]  [<f8d5ed03>] ? assert_pll+0x73/0x80 [i915]
Jun 13 23:33:42 t61 kernel: [29997.054714]  [<f8d5ed03>] ? assert_pll+0x73/0x80 [i915]
Jun 13 23:33:42 t61 kernel: [29997.054719]  [<c1056843>] warn_slowpath_fmt+0x33/0x40
Jun 13 23:33:42 t61 kernel: [29997.054741]  [<f8d5ed03>] assert_pll+0x73/0x80 [i915]
Jun 13 23:33:42 t61 kernel: [29997.054764]  [<f8d643ca>] intel_crtc_load_lut+0x19a/0x1b0 [i915]
Jun 13 23:33:42 t61 kernel: [29997.054785]  [<f8d5c3e5>] ? assert_panel_unlocked+0xa5/0xb0 [i915]
Jun 13 23:33:42 t61 kernel: [29997.054807]  [<f8d645ab>] i9xx_crtc_enable+0x1cb/0x3a0 [i915]
Jun 13 23:33:42 t61 kernel: [29997.054830]  [<f8d66bfc>] __intel_set_mode+0x75c/0x8b0 [i915]
Jun 13 23:33:42 t61 kernel: [29997.054855]  [<f8d6c1ba>] intel_modeset_setup_hw_state+0x9da/0xae0 [i915]
Jun 13 23:33:42 t61 kernel: [29997.054880]  [<f8d6dfb9>] intel_lid_notify+0x99/0xe0 [i915]
Jun 13 23:33:42 t61 kernel: [29997.054884]  [<c1659f91>] notifier_call_chain+0x41/0x60
Jun 13 23:33:42 t61 kernel: [29997.054889]  [<c10797ab>] __blocking_notifier_call_chain+0x3b/0x60
Jun 13 23:33:42 t61 kernel: [29997.054893]  [<c10797ef>] blocking_notifier_call_chain+0x1f/0x30
Jun 13 23:33:42 t61 kernel: [29997.054897]  [<c1394d16>] acpi_lid_send_state+0x78/0xa3
Jun 13 23:33:42 t61 kernel: [29997.054901]  [<c139510a>] acpi_button_notify+0x3c/0xd5
Jun 13 23:33:42 t61 kernel: [29997.054905]  [<c1370a26>] acpi_device_notify+0x16/0x18
Jun 13 23:33:42 t61 kernel: [29997.054910]  [<c137ec53>] acpi_ev_notify_dispatch+0x35/0x4a
Jun 13 23:33:42 t61 kernel: [29997.054913]  [<c136d1dd>] acpi_os_execute_deferred+0x11/0x1c
Jun 13 23:33:42 t61 kernel: [29997.054917]  [<c106eb4b>] process_one_work+0x11b/0x3b0
Jun 13 23:33:42 t61 kernel: [29997.054921]  [<c1063562>] ? mod_timer+0x112/0x1c0
Jun 13 23:33:42 t61 kernel: [29997.054925]  [<c106f749>] worker_thread+0xf9/0x380
Jun 13 23:33:42 t61 kernel: [29997.054928]  [<c106f650>] ? rescuer_thread+0x340/0x340
Jun 13 23:33:42 t61 kernel: [29997.054932]  [<c1074f41>] kthread+0xa1/0xc0
Jun 13 23:33:42 t61 kernel: [29997.054937]  [<c165d877>] ret_from_kernel_thread+0x1b/0x28
Jun 13 23:33:42 t61 kernel: [29997.054940]  [<c1074ea0>] ? kthread_create_on_node+0x150/0x150
Jun 13 23:33:42 t61 kernel: [29997.054943] ---[ end trace 5e7f4fcd51e4d9aa ]---
Jun 13 23:33:42 t61 kernel: [29997.054986] ------------[ cut here ]------------

the file /sys/class/drm/card0/error contains : no error state collected

drm version: 1.1.0 20060810

OS: Xubuntu
Comment 1 ilakast 2014-06-13 23:56:02 UTC
xrandr -q

Screen 0: minimum 320 x 200, current 2048 x 1280, maximum 32767 x 32767
LVDS1 connected (normal left inverted right x axis y axis)
   1400x1050      60.0 +   60.0     50.0  
   1280x1024      60.0  
   1280x960       60.0  
   1360x768       59.8     60.0  
   1152x864       60.0  
   1024x768       60.0  
   800x600        60.3     56.2  
   640x480        59.9  
VGA1 connected 1024x1280+1024+0 left (normal left inverted right x axis y axis) 376mm x 301mm
   1280x1024      60.0*+   75.0  
   1280x960       75.0     60.0  
   1152x864       75.0  
   1024x768       75.1     70.1     60.0  
   832x624        74.6  
   800x600        72.2     75.0     60.3     56.2  
   640x480        75.0     72.8     66.7     60.0  
   720x400        70.1  
DVI1 connected 1024x1280+0+0 left (normal left inverted right x axis y axis) 376mm x 301mm
   1280x1024      60.0*+
   1024x768       60.0  
   800x600        60.3  
   640x480        60.0  
   720x400        70.1  
VIRTUAL1 disconnected (normal left inverted right x axis y axis)
Comment 2 ilakast 2014-06-14 00:12:35 UTC
not a lot of useful info from Xorg.0.log.old

[ 29970.500] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[ 29970.500] (EE) intel(0): When reporting this, please include /sys/class/drm/card0/error and the full dmesg.
[ 30000.860] (II) intel(0): resizing framebuffer to 1400x1050
Comment 3 Chris Wilson 2014-06-14 06:56:39 UTC
The loss of display is a characteristic of a GPU hang on that machine. Ville has been looking at improving the GPU reset which may help, but really that is a secondary problem. The primary issue is the cause of the GPU hang, for which we need the /sys/class/drm/card0/error captured before you reboot.
Comment 4 ilakast 2014-06-14 10:37:23 UTC
Thank you for your input Chris. I will try to think of a way to get the content of /sys/class/drm/card0/error with no display, before rebooting. Any ideas on how to practically achieve that, most welcome.
Comment 5 ilakast 2014-06-16 20:23:55 UTC
Created attachment 101195 [details]
Output of /sys/class/drm/card0/error

Managed to ssh into the crashed laptop, so please find attached. Any feedback would be GREATLY appreciated.
Comment 6 ilakast 2014-06-18 12:35:03 UTC
To help with debugging, it seems that crashes are not that random after all. The majority of times are connected to the Minitube application (version 2.1.6, latest as of today). Not necessarily with playback but also when stopping a video or when switching from fullscreen to window mode and vice versa.
Comment 7 ilakast 2014-06-29 16:05:38 UTC
Created attachment 101979 [details]
latest crash when opening a pdf file (evince)

Output from /sys/class/drm/card0/error
Comment 8 ilakast 2014-06-29 16:12:53 UTC
(In reply to comment #7)
> Created attachment 101979 [details]
> latest crash when opening a pdf file (evince)
> 
> Output from /sys/class/drm/card0/error

This crash actually happened when opening a pdf file with evince, so my previous statement is now not true. Crashes are not software-related.
Comment 9 Chris Wilson 2014-07-31 14:39:59 UTC
The batch buffer gets overwritten. This seems to happen as it is being read, but I couldn't spot the culprit within the batch, so I presume it is the render cache that gets flushed after userspace has written the new batch. You definitely want to test with the latest kernel and xf86-video-intel.
Comment 10 Jani Nikula 2014-09-08 13:48:03 UTC
(In reply to comment #9)
> You definitely want to test with the latest kernel and xf86-video-intel.

Reporter, please try them.
Comment 11 ncopa 2014-10-20 14:32:56 UTC
I have very very similar problem with i915 on an old dell optiplex 745 with ubuntu 14.04.

I found this bug by googling
"/build/buildd/linux-3.13.0/drivers/gpu/drm/i915/intel_display.c:922"

I can consequently reproduce it by opening 
https://chrome.google.com/webstore/detail/user-agent-switcher-for-c/djflhoibgkdhkhhcedjiklpkjnoahfmg in google chrome.

I have tested using UXA instead of SNA without that making any difference.

So i googled ubuntu .deb package for newer linux kernels.

3.14.x - no change. same thing happens.
3.16.x - GPU crashes but kernel does not completely die. The window manager died though.
3.17.x - same as 3.16.

I also found some deb package for xf86-video-intel-2.99.216 + git20141016 but issue is still there.

I could try build xf86-video-intel from git and do git bisect, but I'd need some hint where to start from, eg, what would likely be a starting point for 'good'.

I will not bisect kernel though, not on this old box.
Comment 12 ncopa 2014-10-20 14:33:56 UTC
one more thing, its 64bit ubuntu here (aka amd64).
Comment 13 Jesse Barnes 2015-03-30 20:46:32 UTC
Have you tried an updated kernel yet?  Getting a current dmesg and crash dump might help, assuming this bug still exists...
Comment 14 Jani Nikula 2015-10-23 09:50:54 UTC
Timeout, closing. Please reopen if the problem persists with latest kernels.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.