Bug 83516

Summary: [SNB/HSW] *ERROR* too many voltage retries, give up
Product: DRI Reporter: Chris Cheney <chris.cheney>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: andreas.sturmlechner, intel-gfx-bugs, leho, nmcveity
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: HSW, SNB i915 features: display/DP
Attachments:
Description Flags
dmesg output showing problem
none
dmesg for 3.17.0-994-201409130305 after switch back to x
none
dmesg info none

Description Chris Cheney 2014-09-05 06:24:22 UTC
Created attachment 105779 [details]
dmesg output showing problem

I have a haswell i7-4770k desktop system that has 2 HDMI ports and 1 DisplayPort connection. I have 2 HDMI monitors hooked up to the HDMI ports and a HDMI monitor hooked to an active DisplayPort to HDMI adapter. With earlier kernels I was getting an error of "[drm] HPD interrupt storm detected on connector DP-1: switching from hotplug detection to polling" and the displayport does not work reliably but after trying the latest drm-intel-nightly from tonight I get a different error of "*ERROR* too many voltage retries, give up". The three monitors work fine in the current configuration under Windows so I think it may be a driver issue.

[   48.237921] [drm:intel_dp_start_link_train] *ERROR* too many voltage retries, give up

There is also a backtrace in the dmesg output:

[   48.741864] WARNING: CPU: 0 PID: 256 at /home/apw/COD/linux/drivers/gpu/drm/i915/intel_dp.c:3649 intel_dp_link_down+0x223/0x260 [i915]()
[   48.741865] Modules linked in: bnep rfcomm bluetooth joydev hid_logitech_dj usbhid uas usb_storage hid snd_hda_codec_hdmi usblp nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel mxm_wmi snd_hda_codec_realtek aes_x86_64 snd_hda_codec_generic lrw gf128mul glue_helper ablk_helper cryptd i915 snd_hda_intel serio_raw snd_hda_controller snd_seq_midi snd_hda_codec snd_seq_midi_event snd_hwdep snd_rawmidi mei_me lpc_ich snd_pcm mei video snd_seq wmi drm_kms_helper tpm_infineon snd_seq_device drm mac_hid snd_timer snd soundcore ppdev lp parport psmouse firewire_ohci igb firewire_core ahci e1000e i2c_algo_bit dca libahci crc_itu_t ptp pps_core [last unloaded: parport_pc]
[   48.742032]  [<ffffffffc0152aa0>] ? drm_modeset_lock+0x40/0x100 [drm]
[   48.742039]  [<ffffffffc0144380>] drm_mode_set_config_internal+0x60/0x100 [drm]
[   48.742045]  [<ffffffffc0196583>] restore_fbdev_mode+0xd3/0x100 [drm_kms_helper]
[   48.742049]  [<ffffffffc019667c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2c/0x50 [drm_kms_helper]
[   48.742053]  [<ffffffffc0197d61>] drm_fb_helper_set_par+0x31/0x80 [drm_kms_helper]
[   48.742056]  [<ffffffffc0197cdc>] drm_fb_helper_hotplug_event+0xcc/0x120 [drm_kms_helper]
[   48.742071]  [<ffffffffc018f55b>] drm_kms_helper_hotplug_event+0x2b/0x40 [drm_kms_helper]

--

I am attaching a 'drm.debug=0xe log_buf_len=4M" dmesg which seems to be more useful from looking around in the bug reports. I had not attached the adapter/monitor at boot time in the dmesg, but plugged it in at roughly 47s in the boot log.

If you would like me to do further testing I'm available anytime. :-)
Comment 1 Chris Cheney 2014-09-05 06:40:55 UTC
The later set of messages at 128s, etc were when I switched from console to X back and forth. The screen did seem to work at one point but then stopped working again, probably due to switching back and forth.
Comment 2 Jani Nikula 2014-09-05 07:49:20 UTC
If you can bisect, it would be awesome.
Comment 3 Chris Cheney 2014-09-05 14:08:30 UTC
I'm not sure which point I should bisect down to, the previous kernel I tried which was the current Ubuntu 14.04 didn't actually work either, the error message was just different.
Comment 4 Chris Wilson 2014-09-06 10:51:35 UTC
The "too many voltages" is the regression you want to pin down.
Comment 5 Chris Cheney 2014-09-06 19:27:00 UTC
I tested the oldest prebuilt intel-drm-nightly I found on Ubuntu's kernel site which was from 08/26 (d82af52b1766594ece621e427f1604194ca2b415) and it still had the issue. I'll see if I can bisect back far enough to not get the error.
Comment 6 Chris Cheney 2014-09-07 18:31:40 UTC
I now have it narrowed down to sometime in June. Hopefully I will have the offending commit by later tonight.
Comment 7 Chris Cheney 2014-09-08 02:19:57 UTC
Got it.

--

c79057922ed6c2c6df1214e6ab4414fea1b23db2 is the first bad commit
commit c79057922ed6c2c6df1214e6ab4414fea1b23db2
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Apr 16 16:56:09 2014 +0200

    drm/i915: Remove vblank wait from haswell_write_eld
    
    The pipe is off at that point in time, so a vblank wait is simply a
    50ms wait. Caught by Jesse's verbose "make vblank wait timeouts WARN"
    patch. We've probably had a few versions of this float around already.
    
    To document assumptions put a pipe assert into the same place. And
    also add a posting read.
    
    If we ever decide to update the eld and infoframes while the pipe is
    already on (e.g. for fastboot) then there's lots of work to do. So
    better properly document all the hidden assumptions.
    
    Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 a65e012daaeba6dc023720fec6058feb97105626 49ac5b7847710371efb3efd7c17a3847a500359c M	drivers


---

# git bisect log
git bisect start
# bad: [a1a6cc1d2ea9e3adf81faab87b834bc903856207] ata: pata_samsung_cf: removes s5pc100 related ata codes
git bisect bad a1a6cc1d2ea9e3adf81faab87b834bc903856207
# good: [52c324f8a87b336496d0f5e9d8dff1aa32bb08cd] cpuidle: Combine cpuidle_enabled() with cpuidle_select()
git bisect good 52c324f8a87b336496d0f5e9d8dff1aa32bb08cd
# good: [b77279bc2e81545b20824da701b349272a78e4e7] Merge tag 'sound-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound into next
git bisect good b77279bc2e81545b20824da701b349272a78e4e7
# good: [859862ddd2b6b8dee00498c015ab37f02474b442] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
git bisect good 859862ddd2b6b8dee00498c015ab37f02474b442
# good: [4da005cf1e30897520106114a8ce11a5aa558497] qlcnic: Pre-allocate DMA buffer used for minidump collection
git bisect good 4da005cf1e30897520106114a8ce11a5aa558497
# bad: [bc1dfff04a5d4064ba0db1fab13f84ab4f333d2b] Merge branch 'drm-nouveau-next' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next
git bisect bad bc1dfff04a5d4064ba0db1fab13f84ab4f333d2b
# bad: [4fa62c890cea83f28c30e1d5dc8fc86f61210280] drm/i915: Move buffer pinning and ring selection to intel_crtc_page_flip()
git bisect bad 4fa62c890cea83f28c30e1d5dc8fc86f61210280
# bad: [aeab0b5af7df88284d101abf8d121f0e913b81ff] drm/i915: disable runtime PM if RC6 is disabled
git bisect bad aeab0b5af7df88284d101abf8d121f0e913b81ff
# good: [13732ba7493fd4b28568f768ee12497e26a0c8af] drm/i915: move infoframe setting to after pll enable v3
git bisect good 13732ba7493fd4b28568f768ee12497e26a0c8af
# bad: [7e9ab4081e646fc317d0a87929a352f0e5082190] Merge branch 'drm-coverity-fixes' of git://people.freedesktop.org/~danvet/drm into drm-next
git bisect bad 7e9ab4081e646fc317d0a87929a352f0e5082190
# good: [23d0b13036d14257ae4d226209cd7845f25af8e0] drm/i915/bdw: Add 42ms delay for IPS disable
git bisect good 23d0b13036d14257ae4d226209cd7845f25af8e0
# good: [8268bd48af9aae5e079d3ba8403ae459ff06cbcb] drm/i2c/tda998x: Fix signed overflow issue
git bisect good 8268bd48af9aae5e079d3ba8403ae459ff06cbcb
# good: [a5c4d7bc187bd13bc11ac06bb4ea3a0d4001aa4d] drm/i915: Disable/enable planes as the first/last thing during modeset on ILK+
git bisect good a5c4d7bc187bd13bc11ac06bb4ea3a0d4001aa4d
# bad: [b87577b7c768683736eea28f70779e8c75b4df62] drm: try harder to avoid regression when merging mode bits
git bisect bad b87577b7c768683736eea28f70779e8c75b4df62
# bad: [885ac04ab3a226d28147853d6d98eee3897a5636] Merge tag 'drm-intel-next-2014-04-16' of git://anongit.freedesktop.org/drm-intel into drm-next
git bisect bad 885ac04ab3a226d28147853d6d98eee3897a5636
# bad: [c79057922ed6c2c6df1214e6ab4414fea1b23db2] drm/i915: Remove vblank wait from haswell_write_eld
git bisect bad c79057922ed6c2c6df1214e6ab4414fea1b23db2
# first bad commit: [c79057922ed6c2c6df1214e6ab4414fea1b23db2] drm/i915: Remove vblank wait from haswell_write_eld
Comment 8 Chris Cheney 2014-09-08 04:54:34 UTC
I upgraded my motherboard bios, which appears to have a new GOP/VBIOS, after tracking down the commit for this issue. After the update I don't get the error on last night's Ubuntu drm-intel-nightly daily build. However DisplayPort is still not working and I'm getting a backtrace that looks effectively the same as in the prior dmesg.

I don't know how to verify the GOP/VBIOS version, is there a way to do that with the intel-gpu-tools? It might be useful if it is the obtainable to put that information into dmesg output, for debugging purposes, since it seems some of these issues may be interactions from differing versions of them.

---

[   48.134639] ------------[ cut here ]------------
[   48.134680] WARNING: CPU: 0 PID: 1402 at /home/apw/COD/linux/drivers/gpu/drm/i915/intel_dp.c:3649 intel_dp_link_down+0x223/0x260 [i915]()
[   48.134706] Modules linked in: rfcomm bnep bluetooth joydev hid_logitech_dj usbhid uas hid usb_storage usblp intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm nls_iso8859_1 mxm_wmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek aesni_intel snd_hda_codec_generic aes_x86_64 snd_hda_codec_hdmi lrw gf128mul glue_helper ablk_helper cryptd serio_raw lpc_ich snd_hda_intel wmi snd_hda_controller mac_hid snd_hda_codec tpm_infineon snd_hwdep i915 snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device video snd_timer drm_kms_helper drm snd mei_me mei soundcore ppdev lp parport igb e1000e firewire_ohci i2c_algo_bit psmouse firewire_core dca ahci ptp crc_itu_t libahci pps_core [last unloaded: parport_pc]
[   48.134708] CPU: 0 PID: 1402 Comm: Xorg Not tainted 3.17.0-994-generic #201409060350
[   48.134709] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD5H/Z87X-UD5H-CF, BIOS F10 08/21/2014
[   48.134711]  0000000000000e41 ffff8807ec19b6d8 ffffffff82795dc5 0000000000000286
[   48.134712]  0000000000000000 ffff8807ec19b718 ffffffff82074a1c 0000000000152487
[   48.134713]  ffff8807ec5b0000 ffff8807eb87a0e0 ffff8807ebca9000 0000000080000002
[   48.134714] Call Trace:
[   48.134722]  [<ffffffff82795dc5>] dump_stack+0x46/0x58
[   48.134726]  [<ffffffff82074a1c>] warn_slowpath_common+0x8c/0xc0
[   48.134728]  [<ffffffff82074a6a>] warn_slowpath_null+0x1a/0x20
[   48.134740]  [<ffffffffc0660723>] intel_dp_link_down+0x223/0x260 [i915]
[   48.134753]  [<ffffffffc06670e1>] intel_dp_complete_link_train+0x101/0x220 [i915]
[   48.134764]  [<ffffffffc065ead7>] intel_ddi_pre_enable+0x137/0x1b0 [i915]
[   48.134774]  [<ffffffffc064c40c>] haswell_crtc_enable+0xfc/0x310 [i915]
[   48.134785]  [<ffffffffc064a6df>] __intel_set_mode+0x31f/0x480 [i915]
[   48.134795]  [<ffffffffc064ea96>] intel_set_mode+0x16/0x30 [i915]
[   48.134805]  [<ffffffffc064f596>] intel_crtc_set_config+0x1e6/0x370 [i915]
[   48.134820]  [<ffffffffc0540aa0>] ? drm_modeset_lock+0x40/0x100 [drm]
[   48.134828]  [<ffffffffc0532380>] drm_mode_set_config_internal+0x60/0x100 [drm]
[   48.134833]  [<ffffffffc0599583>] restore_fbdev_mode+0xd3/0x100 [drm_kms_helper]
[   48.134837]  [<ffffffffc059967c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2c/0x50 [drm_kms_helper]
[   48.134841]  [<ffffffffc059ad61>] drm_fb_helper_set_par+0x31/0x80 [drm_kms_helper]
[   48.134844]  [<ffffffffc059acdc>] drm_fb_helper_hotplug_event+0xcc/0x120 [drm_kms_helper]
[   48.134846]  [<ffffffffc059ad79>] drm_fb_helper_set_par+0x49/0x80 [drm_kms_helper]
[   48.134857]  [<ffffffffc06583fa>] intel_fbdev_set_par+0x1a/0x60 [i915]
[   48.134861]  [<ffffffff82411343>] fb_set_var+0x283/0x3a0
[   48.134864]  [<ffffffff820ab190>] ? check_preempt_wakeup+0x110/0x210
[   48.134866]  [<ffffffff82408184>] fbcon_blank+0x1e4/0x2d0
[   48.134869]  [<ffffffff824960ae>] do_unblank_screen.part.21+0x9e/0x180
[   48.134871]  [<ffffffff824961d8>] do_unblank_screen+0x48/0x80
[   48.134873]  [<ffffffff8248b735>] complete_change_console+0x65/0xf0
[   48.134875]  [<ffffffff8248c8ec>] vt_ioctl+0x112c/0x11c0
[   48.134882]  [<ffffffffc052b610>] ? drm_setmaster_ioctl+0xe0/0xe0 [drm]
[   48.134886]  [<ffffffff8247fda8>] tty_ioctl+0x298/0x8f0
[   48.134890]  [<ffffffff82226f42>] ? fsnotify+0x1c2/0x280
[   48.134894]  [<ffffffff821fbe95>] do_vfs_ioctl+0x75/0x2c0
[   48.134897]  [<ffffffff821e9e16>] ? vfs_write+0x196/0x1f0
[   48.134899]  [<ffffffff82206545>] ? __fget_light+0x25/0x70
[   48.134901]  [<ffffffff821fc171>] SyS_ioctl+0x91/0xb0
[   48.134903]  [<ffffffff827a392d>] system_call_fastpath+0x1a/0x1f
[   48.134904] ---[ end trace a85f12aba661920f ]---
Comment 9 Jani Nikula 2014-09-11 16:35:36 UTC
It seems that the original bug is now gone with the BIOS upgrade, and you've filed bug 83600 about the remaining issue. Closing this one, thanks for the report. Please reopen if you can still reproduce the original issue in this report.
Comment 10 Chris Cheney 2014-09-13 07:02:48 UTC
Tested a newer nightly build and the "*ERROR* too many voltage retries, give up" is back. I'm not too surprised since it appears to be timing related at least from what commit looked to be saying.

I am attaching a new dmesg output. The timeline is as follows:

[    0.000000] - System booted up to X and logged into desktop.
[  194.838153] - plugged in monitor. screen actually worked
[  265.526157] - switched to console. screen stopped working
[  346.870641] - switched back to X. screen still not working

---

Interestingly even when the monitor isn't lit xrandr still sees it and the system is 'using' it without me being able to see anything. When I turn off the monitor and rerun xrandr it shows as disconnected, which also seems a bit odd since the monitor isn't disconnected. Turning off one of the directly connected HDMI monitors doesn't cause xrandr to think they are disconnected.

$ xrandr
Screen 0: minimum 320 x 200, current 3840 x 1080, maximum 32767 x 32767
VGA1 disconnected (normal left inverted right x axis y axis)
HDMI1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 510mm x 290mm
   1920x1080      60.0*+   50.0     59.9  
   1920x1080i     60.1     50.0     60.0  
   1680x1050      59.9  
   1280x1024      60.0  
   1280x960       60.0  
   1152x864       60.0  
   1280x720       60.0     50.0     59.9  
   1024x768       60.0  
   800x600        60.3  
   720x576        50.0  
   720x480        60.0     59.9  
   640x480        60.0     59.9  
HDMI2 connected 1920x1080+1920+0 (normal left inverted right x axis y axis) 510mm x 290mm
   1920x1080      60.0*+   50.0     59.9  
   1920x1080i     60.1     50.0     60.0  
   1680x1050      59.9  
   1280x1024      60.0  
   1280x960       60.0  
   1152x864       60.0  
   1280x720       60.0     50.0     59.9  
   1024x768       60.0  
   800x600        60.3  
   720x576        50.0  
   720x480        60.0     59.9  
   640x480        60.0     59.9  
DP1 connected (normal left inverted right x axis y axis)
   1920x1080      60.0 +   50.0     59.9  
   1920x1080i     60.1     50.0     60.0  
   1680x1050      59.9  
   1280x1024      60.0  
   1280x960       60.0  
   1152x864       60.0  
   1280x720       60.0     50.0     59.9  
   1024x768       60.0  
   800x600        60.3  
   720x576        50.0  
   720x480        60.0     59.9  
   640x480        60.0     59.9  
HDMI3 disconnected (normal left inverted right x axis y axis)
VIRTUAL1 disconnected (normal left inverted right x axis y axis)
Comment 11 Chris Cheney 2014-09-13 07:03:57 UTC
Created attachment 106204 [details]
dmesg for 3.17.0-994-201409130305 after switch back to x
Comment 12 Jani Nikula 2015-01-29 13:19:41 UTC
Long time no updates, but plenty of changes in the driver. Please retest on current drm-intel-nightly.
Comment 13 Chris Cheney 2015-02-01 08:05:57 UTC
I'm not getting the ERROR with the build using 8b4216f91c7bf8d3459cadf9480116220bd6545e from today, but my DP connection is still not working. I'll update 83600 with the current dmesg.
Comment 14 Chris Cheney 2015-02-06 21:14:36 UTC
I was wrong, it is still happening just not immediately at boot.

[163298.688152] [drm:intel_dp_start_link_train [i915]] *ERROR* too many voltage retries, give up
[330019.451587] [drm:intel_dp_start_link_train [i915]] *ERROR* too many voltage retries, give up
Comment 15 ye.tian 2015-03-02 06:35:03 UTC
Kernel: drm-intel-testing-2015-02-27
Reproduce step on SNB
1, start X
2, xrander --output DP1 --rotate left
3,unplug and plug DP
Comment 16 ye.tian 2015-03-02 06:51:53 UTC
Created attachment 113907 [details]
dmesg info
Comment 17 yann 2016-06-08 17:29:59 UTC
Consolidating "clock recovery fails with too many full/voltage retries" bugs into one bug
So continue to track this in bug 96436

*** This bug has been marked as a duplicate of bug 96436 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.