Bug 109522 - drm:intel_dp_start_link_train [i915] *ERROR* failed to enable link training
Summary: drm:intel_dp_start_link_train [i915] *ERROR* failed to enable link training
Status: NEEDINFO
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-31 16:20 UTC by Chris Murphy
Modified: 2019-08-09 05:50 UTC (History)
4 users (show)

See Also:
i915 platform: SKL
i915 features: display/Other


Attachments
journalctl (1.73 MB, text/plain)
2019-01-31 16:30 UTC, Chris Murphy
no flags Details
acpidump (963.87 KB, text/plain)
2019-01-31 16:30 UTC, Chris Murphy
no flags Details
dmidecode (19.62 KB, text/plain)
2019-01-31 16:31 UTC, Chris Murphy
no flags Details
lspci (52.75 KB, text/plain)
2019-01-31 16:31 UTC, Chris Murphy
no flags Details
dmesg (9.87 MB, text/plain)
2019-02-23 05:29 UTC, Chris Murphy
no flags Details
dmesg with drm.debug=0x1e on both just console and within X (2.92 MB, text/x-log)
2019-06-21 04:01 UTC, howaboutsynergy
no flags Details
dmesg with drm.debug-0x1e on drm-tip (only within X) (2.10 MB, text/plain)
2019-06-27 09:56 UTC, howaboutsynergy
no flags Details
cleaner dmesg with drm.debug=0x1e on drm-tip (only within X) (1.42 MB, text/plain)
2019-06-27 10:07 UTC, howaboutsynergy
no flags Details
dmesg with drm.debug=0x1e and i915.disable_power_well=0 (only within X) (2.20 MB, text/plain)
2019-06-27 10:18 UTC, howaboutsynergy
no flags Details
Increase panel power cycle delay test (1.30 KB, patch)
2019-07-08 23:32 UTC, Manasi
no flags Details | Splinter Review
dmesg with patch from comment 33 and drm.debug=0xe (532.58 KB, text/plain)
2019-07-09 13:14 UTC, howaboutsynergy
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Murphy 2019-01-31 16:20:55 UTC
Summary: Often during wake from sleep there is a brief display flicker (black to GDM login screen, back to black for about 1s, back to GDM login screen), and also there is this error in red text found in dmesg/journalctl -k

Jan 31 08:37:41 flap.local kernel: [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training

The flicker is cosmetic. And I'm not sure if the red text is even related to the flicker.


Reproducible: 75%

Hardware: 

HP Spectre (circa 2016), Firmware Version: F.42 Release Date: 10/25/2018, is current.

Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz

00:02.0 VGA compatible controller [0300]: Intel Corporation Skylake GT2 [HD Graphics 520] [8086:1916] (rev 07) (prog-if 00 [VGA controller])

External display: NEC MultiSync PA241W. Connection is USB-C Thunderbolt 3 to HDMI adapter, HDMI to DVI adapter, DVI cable to display.
Comment 1 Chris Murphy 2019-01-31 16:30:03 UTC
Created attachment 143263 [details]
journalctl

kernel 5.0.0-0.rc4.git1.1.fc30.x86_64 is git 4aa9fc2a435a

drm.debug=0x1e is applied

These times are based on a cellphone; the laptop hasn't yet raised wifi to sync with NTP so its clock could be off by multiple seconds.
8:37:31 - approximate time spacebar is pushed to wake laptop
8:37:42 - approximate time of flicker

This journal contains everything from the time of wake through login; about 90 seconds.
Comment 2 Chris Murphy 2019-01-31 16:30:39 UTC
Created attachment 143264 [details]
acpidump

acpidump > acpidump.bin
Comment 3 Chris Murphy 2019-01-31 16:31:19 UTC
Created attachment 143265 [details]
dmidecode

dmidecode > dmidecode.txt
Comment 4 Chris Murphy 2019-01-31 16:31:50 UTC
Created attachment 143266 [details]
lspci

lspci -vvnn > lspcivvnn.txt
Comment 5 Chris Murphy 2019-01-31 16:34:40 UTC
The reported flicker happens on the built-in display. At the time of the flicker the external display is still in powersave, its amber light hasn't yet turned green.
Comment 6 Lakshmi 2019-01-31 17:02:39 UTC
Is this issue duplicate to Bug 105998?
Comment 7 Chris Murphy 2019-01-31 18:22:18 UTC
It is a dup of 105998 comment 3. But I'm not certain if it's a dup of that bug report.
Comment 8 Chris Murphy 2019-01-31 18:25:42 UTC
Actually in bug 105998 comment 4 suggests I open a new bug, as it's unclear if what I'm seeing in comment 3 is really the same problem as that bug. Anyway, I can test with i915.disable_power_well=0 if that's helpful.
Comment 9 Lakshmi 2019-02-22 09:12:21 UTC
Chris, can you add the full dmesg with kernal parameters drm.debug=0x1e log_buf_len=4M from boot? That might more information about the issue.
Comment 10 Chris Murphy 2019-02-23 05:29:40 UTC
Created attachment 143444 [details]
dmesg

5.0.0-0.rc7.git2.1 - This is linux git 2137397c92ae

wakeup and error happens at

[  362.025567] flap.local kernel: [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
Comment 11 howaboutsynergy 2019-05-22 11:33:20 UTC
Getting this on ArchLinux with kernel 5.1.3-g7cb9c5d341b9, i7-8700k CPU, no discrete graphics card, motherboard Asus Prime Z370-A, using BIOS version Version 1002 2018/07/06 8.76 MBytes (latest BIOS being Version 2001 2019/05/15 10.31 MBytes, please don't ask me to update), intel-ucode 20190514.a-1 (ie. the MDS one), mesa 19.0.4-1, NOTE: NOT using: xf86-video-intel, libva-intel-driver, libva1-intel-driver.
```
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Desktop)
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
```

I think this is why I also get:
```
$ xrandr --verbose|grep -C2 -w 85|grep --color=always Bad|head -1
	link-status: Bad 

$ grep -A1 DP-1 /home/user/.config/xfce4/xfconf/xfce-perchannel-xml/displays.xml|grep -e 'Active' -e 'false'
      <property name="Active" type="bool" value="false"/>
```

seemingly at/after the times when monitor was turned off(or went to sleep?):
```
[ 4373.521845] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[ 6979.762718] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
```

Something from `Xorg.0.log`:
```
...
[    32.021] (II) modeset(0): glamor X acceleration enabled on Mesa DRI Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2) 
...
[  2974.053] (II) modeset(0): Modeline "1280x800"x0.0   83.50  1280 1352 1480 1680  800 803 809 831 -hsync +vsync (49.7 kHz e)
[  4374.136] (WW) modeset(0): hotplug event: connector 85's link-state is BAD, tried resetting the current mode. You may be leftwith a black screen if this fails...
[  4374.719] (WW) modeset(0): hotplug event: connector 85's link-state is BAD, tried resetting the current mode. You may be leftwith a black screen if this fails...
[  4378.425] (II) modeset(0): EDID vendor "GSM", prod id 23305
...
[  4442.760] (WW) modeset(0): hotplug event: connector 85's link-state is BAD, tried resetting the current mode. You may be leftwith a black screen if this fails...
...
[  4445.432] (II) modeset(0): Modeline "1280x800"x0.0   83.50  1280 1352 1480 1680  800 803 809 831 -hsync +vsync (49.7 kHz e)
[  6980.360] (WW) modeset(0): hotplug event: connector 85's link-state is BAD, tried resetting the current mode. You may be leftwith a black screen if this fails...
[  6980.926] (WW) modeset(0): hotplug event: connector 85's link-state is BAD, tried resetting the current mode. You may be leftwith a black screen if this fails...
[  6980.976] (WW) modeset(0): hotplug event: connector 85's link-state is BAD, tried resetting the current mode. You may be leftwith a black screen if this fails...
[  6981.172] (II) modeset(0): EDID vendor "GSM", prod id 23305
...
[  6990.753] (WW) modeset(0): hotplug event: connector 85's link-state is BAD, tried resetting the current mode. You may be leftwith a black screen if this fails...
```
Comment 12 howaboutsynergy 2019-05-22 12:08:45 UTC
Can't manually reproduce it, I tried turning monitor off, then back on; tried:
```
sleep 1; xset dpms force suspend; sleep 1; xset dpms force off; sleep 1; xset dpms force standby
setterm --blank force
```

They all seemed to work so far, so the dmesg output with this patch:
```diff
$ git diff drivers/gpu/drm/i915/intel_dp_link_training.c|tee
diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
index b59c87daa4f7..e69b2198af89 100644
--- a/drivers/gpu/drm/i915/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
@@ -87,6 +87,7 @@ intel_dp_set_link_train(struct intel_dp *intel_dp,
 	ret = drm_dp_dpcd_write(&intel_dp->aux, DP_TRAINING_PATTERN_SET,
 				buf, len);
 
+  DRM_ERROR("ret=%d len=%d\n", ret, len);
 	return ret == len;
 }
 
```

apparently right after the monitor turns on, I get these 5,5,1 each time on dmesg:
```
$ dmesg --level err
[    3.844322] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[    3.846020] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[    3.848732] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1
[   56.871626] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[   56.873573] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[   56.874693] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1
[   58.212665] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[   58.214381] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[   58.217124] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1
[  103.393851] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  103.395610] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  103.398358] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1
```

So this is when it works normally!

When it doesn't work..., I'll have to update in a next comment when I hit it again. Curious what ret&len I'll get.
Comment 13 howaboutsynergy 2019-05-22 12:15:45 UTC
ok I got it:
1. first turn off monitor
2. while monitor is off, run that `monoff` script from the prev. comment, yes this:
```
sleep 1; xset dpms force suspend; sleep 1; xset dpms force off; sleep 1; xset dpms force standby
setterm --blank force
```
3. wait a few seconds to be sure
4. press Ctrl key to wake up desktop
5. turn on monitor

got:
```
[  694.869217] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=-110 len=5
[  694.869243] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[  695.138593] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=-110 len=1
[  696.768921] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  696.770645] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  696.773386] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1
```

Ok I tried it again, but left 10 seconds before actually doing step 5:

```
[  887.319215] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=-110 len=5
[  887.319254] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[  887.588579] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=-110 len=1
[  898.050275] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  898.052012] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  898.054707] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1
```

So looks like link is Good afterwards(after step 5 that is), but something keeps track of link as being still Bad (via xrandr) if it once was Bad (10 seconds ago).
Comment 14 howaboutsynergy 2019-05-22 12:46:45 UTC
linux-stable/usr/include/asm-generic/errno.h
#define ETIMEDOUT 110 /* Connection timed out */                                                                                
that's the -110 for the ret

it obviously times out because the monitor is physically(from the button) off.
But as soon as it's turned on, all is good again, however that `xrandr --verbose` still reports connection as Bad... unsure if this is an issue in something else.
Comment 15 howaboutsynergy 2019-05-22 19:36:07 UTC
ok, this is weird, I can't reproduce it anymore using Comment 13 steps, after the first boot since the PC was turned off completely.
Comment 16 howaboutsynergy 2019-05-22 23:53:25 UTC
tl;dr: for some reason running xrandr every second prevents this from happening.


ok the only reason that I couldn't reproduce it is because(/when) I was running the following in another terminal:
```
$ while true; do (date; cat /proc/uptime; xrandr --verbose|grep -w 85 -C 2|grep 'link-status: Bad'|| echo "good"); sleep 1; done
```

So while that is running, there's no way I can reproduce it with Comment 13 steps!
As soon as I C-c (aka stop) it, bam, steps are good to reproduce.
Start that while again and again can't reproduce!

So, I guess that's a workaround?
Comment 17 Lakshmi 2019-05-23 05:47:43 UTC
@Manasi, can you help here?
Comment 18 howaboutsynergy 2019-06-16 06:50:18 UTC
ok I have another way to reproduce  that bypasses the xrandr run, and it involves switching step 1 with step 2 in Comment 13. So with this, it doesn't matter if xrandr is ran in a `while` loop, the -110 will happen nonetheless. Kernel tested 5.1.10

The monitor is LG(something) and is connected via the DisplayPort to the ASUS Prime Z370-A motherboard connector and I've no dedicated gfx card, it's using the integrated one from Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz (oh I see I've mentioned this in Comment 11 already, the BIOS is the same version still because the next one is probably trojaned xD)


Ok so to summarize, steps from Comment 13 will only work(to reproduce the issue) if you're not running xrandr in a while loop like:
$ while true; do (date; cat /proc/uptime; xrandr --verbose --dryrun --nograb|grep -w 85 -C 2|grep 'link-status: Bad'|| echo "good"); sleep 1; done
but, if you insist on running xrandr like that(or even without --dryrun --nograb) then simply switching step 1 with step 2(aka run 'monoff' first then turn off monitor) then xrandr will have no effect and the issue can be reproduced 100% (or so it seems, to me).

By reproduce the issue I mean:
[ 3946.388708] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=-110 len=5
[ 3946.388736] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[ 3946.658135] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=-110 len=1
...
[ 3950.951159] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[ 3950.952869] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[ 3950.955573] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1


And in the case when xrandr is ran in a `while` loop using Comment 13 steps (without swapping steps 1 and 2), only the last 3 dmesg messages from above appear, meaning the issue doesn't occurr.

I'm up for trying any patches/tries that you want me to test, even if they lockup/oops/panic, I have kdump set up and we'll get nice stacktrace and stuff;-) just like in this bug for example: https://bugzilla.kernel.org/show_bug.cgi?id=203849

Cheers!
Comment 19 howaboutsynergy 2019-06-21 03:19:40 UTC
`i915.disable_power_well=0` does not help, ie.:

```
$ dmesg|grep -e '0000:00:02.0' -e 'i915'
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux-stable root=UUID=2b8b9ab8-7ac5-4586-aa42-d7ffb12de92a rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=256M panic=0 page_poison=1 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120 mitigations=off nospectre_v1 nospectre_v2 spectre_v2=off nospec_store_bypass_disable kvm-intel.vmentry_l1d_flush=never l1tf=off nopti pti=off no_stf_barrier noibrs noibpb ssbd=force-off spectre_v2_user=off noretpoline mds=off rd.log=all noefi cpuidle.governor=menu zram.num_devices=3 zswap.enabled=0 zswap.same_filled_pages_enabled=1 zswap.compressor=zstd zswap.max_pool_percent=40 zswap.zpool=z3fold i915.alpha_support=1 i915.fastboot=1 i915.disable_power_well=0
[    0.351304] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux-stable root=UUID=2b8b9ab8-7ac5-4586-aa42-d7ffb12de92a rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=256M panic=0 page_poison=1 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120 mitigations=off nospectre_v1 nospectre_v2 spectre_v2=off nospec_store_bypass_disable kvm-intel.vmentry_l1d_flush=never l1tf=off nopti pti=off no_stf_barrier noibrs noibpb ssbd=force-off spectre_v2_user=off noretpoline mds=off rd.log=all noefi cpuidle.governor=menu zram.num_devices=3 zswap.enabled=0 zswap.same_filled_pages_enabled=1 zswap.compressor=zstd zswap.max_pool_percent=40 zswap.zpool=z3fold i915.alpha_support=1 i915.fastboot=1 i915.disable_power_well=0
[    2.757573] pci 0000:00:02.0: [8086:3e92] type 00 class 0x030000
[    2.757741] pci 0000:00:02.0: reg 0x10: [mem 0xde000000-0xdeffffff 64bit]
[    2.757906] pci 0000:00:02.0: reg 0x18: [mem 0xc0000000-0xcfffffff 64bit pref]
[    2.758133] pci 0000:00:02.0: reg 0x20: [io  0xf000-0xf03f]
[    2.758318] device: '0000:00:02.0': device_add
[    2.758446] bus: 'pci': add device 0000:00:02.0
[    2.758609] PM: Adding info for pci:0000:00:02.0
[    2.904428] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    5.653546] bus: 'pci': add driver i915
[    5.653711] bus: 'pci': driver_probe_device: matched device 0000:00:02.0 with driver i915
[    5.653938] bus: 'pci': really_probe: probing driver i915 with device 0000:00:02.0
[    5.654166] i915 0000:00:02.0: no default pinctrl state
[    5.654329] i915 0000:00:02.0: runtime IRQ mapping not provided by arch
[    5.657558] i2c i2c-1: adapter [i915 gmbus dpc] registered
[    5.658289] i2c i2c-2: adapter [i915 gmbus dpb] registered
[    5.658975] i2c i2c-3: adapter [i915 gmbus dpd] registered
[    5.659613] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[    5.944140] device: 'i915': device_add
[    5.944304] bus: 'event_source': add device i915
[    5.944466] PM: Adding info for event_source:i915
[    5.948208] [drm] Initialized i915 1.6.0 20190207 for 0000:00:02.0 on minor 0
[    5.952289] i915 0000:00:02.0: adding component (ops i915_audio_component_bind_ops [i915])
[    5.952878] snd_hda_intel 0000:00:1f.3: found component 0000:00:02.0, duplicate 0
[    5.953119] snd_hda_intel 0000:00:1f.3: binding 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[    5.953732] snd_hda_intel 0000:00:1f.3: Linked as a consumer to 0000:00:02.0
[    5.953909] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[    5.954339] driver: 'i915': driver_bound: bound to device '0000:00:02.0'
[    5.954507] bus: 'pci': really_probe: bound device 0000:00:02.0 to driver i915
[    5.995964] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[    5.997663] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[    6.000374] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1
[    6.062740] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
```

```
...
[    5.648405] Setting dangerous option disable_power_well - tainting kernel
...
[  140.419975] gpg-agent[1102]: handler 0x7a423a6ff700 for fd 10 started
[  140.477095] gpg-agent[1102]: handler 0x7a423a6ff700 for fd 10 terminated
[  200.482440] gpg-agent[1102]: handler 0x7a4239efe700 for fd 10 started
[  200.543903] gpg-agent[1102]: handler 0x7a4239efe700 for fd 10 terminated
[  201.779269] user[2626]: '/home/user/bin/monoff' is executing: 'sleep 1'
[  202.782765] user[2633]: '/home/user/bin/monoff' is executing: 'xset dpms force suspend'
[  202.886388] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D0
[  202.898494] snd_hda_intel 0000:00:1f.3: PME# disabled
[  202.911455] snd_hda_intel 0000:00:1f.3: PME# enabled
[  202.923592] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
[  202.946158] user[2635]: '/home/user/bin/monoff' is executing: 'sleep 1'
[  203.900374] user[2642]: '/home/user/bin/monoff' is executing: 'xset dpms force off'
[  204.009111] user[2644]: '/home/user/bin/monoff' is executing: 'sleep 1'
[  205.015169] user[2651]: '/home/user/bin/monoff' is executing: 'xset dpms force standby'
[  205.124020] user[2653]: '/home/user/bin/monoff' is executing: 'setterm --blank force'
[  220.715002] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=-110 len=5
[  220.715016] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable link training
[  220.984404] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=-110 len=1
[  220.984787] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D0
[  220.996593] snd_hda_intel 0000:00:1f.3: PME# disabled
[  221.009573] snd_hda_intel 0000:00:1f.3: PME# enabled
[  221.022464] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
[  221.320977] i2c i2c-2: NAK from device addr 0x50 msg #0
[  221.324827] i2c i2c-1: NAK from device addr 0x50 msg #0
[  221.328677] i2c i2c-3: NAK from device addr 0x50 msg #0
[  221.600815] i2c i2c-2: NAK from device addr 0x50 msg #0
[  221.604597] i2c i2c-1: NAK from device addr 0x50 msg #0
[  221.608409] i2c i2c-3: NAK from device addr 0x50 msg #0
[  221.904445] i2c i2c-2: NAK from device addr 0x50 msg #0
[  221.908255] i2c i2c-1: NAK from device addr 0x50 msg #0
[  221.912065] i2c i2c-3: NAK from device addr 0x50 msg #0
[  222.183955] i2c i2c-2: NAK from device addr 0x50 msg #0
[  222.187592] i2c i2c-1: NAK from device addr 0x50 msg #0
[  222.191224] i2c i2c-3: NAK from device addr 0x50 msg #0
[  222.544129] i2c i2c-2: NAK from device addr 0x50 msg #0
[  222.547899] i2c i2c-1: NAK from device addr 0x50 msg #0
[  222.551550] i2c i2c-3: NAK from device addr 0x50 msg #0
[  222.585170] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D0
[  222.596599] snd_hda_intel 0000:00:1f.3: PME# disabled
[  222.609442] snd_hda_intel 0000:00:1f.3: PME# enabled
[  222.621440] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
[  232.057307] i2c i2c-2: NAK from device addr 0x50 msg #0
[  232.366672] i2c i2c-2: NAK from device addr 0x50 msg #0
[  232.378634] i2c i2c-2: NAK from device addr 0x50 msg #0
[  232.382283] i2c i2c-1: NAK from device addr 0x50 msg #0
[  232.385931] i2c i2c-3: NAK from device addr 0x50 msg #0
[  232.397218] i2c i2c-2: NAK from device addr 0x50 msg #0
[  232.400973] i2c i2c-1: NAK from device addr 0x50 msg #0
[  232.404723] i2c i2c-3: NAK from device addr 0x50 msg #0
[  232.443129] i2c i2c-2: NAK from device addr 0x50 msg #0
[  232.446689] i2c i2c-1: NAK from device addr 0x50 msg #0
[  232.450356] i2c i2c-3: NAK from device addr 0x50 msg #0
[  232.457487] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  232.459178] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  232.461879] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1
[  232.462209] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D0
[  232.473474] snd_hda_intel 0000:00:1f.3: PME# disabled
[  232.486275] snd_hda_intel 0000:00:1f.3: PME# enabled
[  232.497460] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
[  260.543548] gpg-agent[1102]: handler 0x7a423a6ff700 for fd 10 started
[  260.627503] gpg-agent[1102]: handler 0x7a423a6ff700 for fd 10 terminated
[  301.490986] user[3150]: okiedokie


```

at [  220.715002] I pressed Ctrl key to wake up the software but the monitor was still off! got the ret=-110 messages.
10 seconds later(so at [  232.057307] I'm guessing) I turned on monitor, got the normal ret=5 messages.

`drm.debug=0x1e` spews just too many messages, not sure what to do with it... not even sure if it would log all of them during the test or they would just get overwritten due to the ring property of the dmesg(kmsg?) buffer. Maybe I'll retry.
Comment 20 howaboutsynergy 2019-06-21 04:01:20 UTC
Created attachment 144603 [details]
dmesg with drm.debug=0x1e on both just console and within X

only while inside X (ie. startx) `drm.debug=0x1e` constantly generates dmesg spam

I'm attaching full dmesg log (ring buffer didn't wrap around and thus no overwrites) with drm.debug=0x1e

First I tried in console only (tty1) and the issue doesn't happen there, so this is good:
[    6.032810] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
however on console/tty1 , it requires running `monon` in order to unblank! Pressing enter didn't do anything, so between these two messages I couldn't see what I was typing:
[  222.564063] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
[  229.562650] user[1383]: '/home/user/bin/monon'(on /dev/tty1) is executing: 'sleep 1'

I guess at
[  395.795237] [drm:drm_atomic_state_init [drm]] Allocated atomic state 0000000023bfb3ad
I ran `startx` and that's when the spam started.

At about
[  415.457985] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
I ran a normal monoff then turned on monitor test, so this wouldn't have caused the -110 (link training issue) - I actually forgot (sort of) the steps I was supposed to do here, so ignore this.

I did it right at:
[  443.388949] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=-110 len=5

so between these two I waited: after pressing the key(433) to then turning on monitor(441, 8 seconds later):
[  433.005944] [drm:__drm_atomic_state_free [drm]] Freeing atomic state 0000000076021904
[  441.745985] [drm:drm_atomic_state_init [drm]] Allocated atomic state 0000000076021904

and like 2 sec later is the -110 mentioned above.

Looks like it's those:
[  441.820989] [drm:intel_dp_aux_xfer [i915]] dp_aux_ch timeout status 0x7d4003ff
that Manasi mentions in https://bugs.freedesktop.org/show_bug.cgi?id=105998#c5
however since I've tried with `i915.disable_power_well=0` in my previous comment(albeit without the `drm.debug=0x1e` - does it matter though?!) and it had no effect, I'm out of ideas..

-------
monon is essentially this:
#!/bin/bash

roon() {
  local ec
  logger "'$0'(on $(tty)) is executing: '$*'"
  "$@" ; ec="$?"
  if test "$ec" != "0"; then
    logger "'$0' failed(ec='$ec') executing: '$*'"
  fi
  return "$ec"
}

                                                                                                                                
roon sleep 1
roon xset dpms force on
roon setterm --blank poke

#to also get back from blanking
roon xset s reset

-------
monoff is this:
#!/bin/bash
  
roon() {                                                                                                                        
  local ec
  logger "'$0'(on $(tty)) is executing: '$*'"
  "$@" ; ec="$?"
  if test "$ec" != "0"; then
    logger "'$0' failed(ec='$ec') executing: '$*'"
  fi
  return "$ec"
}

roon sleep 1
roon xset dpms force suspend
roon sleep 1
roon xset dpms force off
roon sleep 1
roon xset dpms force standby
roon setterm --blank force

-------
Comment 21 howaboutsynergy 2019-06-21 04:04:09 UTC
correction:
[    6.032810] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
should be:
[  222.488296] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
Comment 22 howaboutsynergy 2019-06-21 04:26:43 UTC
I guess it just needs a little more time:

[  125.566788] i2c i2c-2: NAK from device addr 0x50 msg #0
[  125.570588] i2c i2c-1: NAK from device addr 0x50 msg #0
[  125.574389] i2c i2c-3: NAK from device addr 0x50 msg #0
[  130.895667] user[1498]: '/home/user/bin/monoff'(on /dev/pts/1) is executing: 'sleep 1'
[  131.904024] user[1506]: '/home/user/bin/monoff'(on /dev/pts/1) is executing: 'xset dpms force suspend'
[  132.011136] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  132.019006] user[1509]: '/home/user/bin/monoff'(on /dev/pts/1) is executing: 'sleep 1'
[  133.027453] user[1517]: '/home/user/bin/monoff'(on /dev/pts/1) is executing: 'xset dpms force off'
[  133.138670] user[1520]: '/home/user/bin/monoff'(on /dev/pts/1) is executing: 'sleep 1'
[  134.147175] user[1528]: '/home/user/bin/monoff'(on /dev/pts/1) is executing: 'xset dpms force standby'
[  134.258545] user[1531]: '/home/user/bin/monoff'(on /dev/pts/1) is executing: 'setterm --blank force'
[  138.779003] snd_hda_intel 0000:00:1f.3: PME# enabled
[  138.790413] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
[  151.416744] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 887 retries
[  151.417491] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(2) after 0 retries
[  151.417637] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(2) after 0 retries
[  151.417806] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(5) after 0 retries
[  151.417822] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  151.418056] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.418230] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(6) after 0 retries
[  151.418389] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(4) after 0 retries
[  151.418623] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.418797] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(6) after 0 retries
[  151.418956] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(4) after 0 retries
[  151.419189] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.419363] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(6) after 0 retries
[  151.419531] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(5) after 0 retries
[  151.419547] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=5 len=5
[  151.420080] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.420254] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(6) after 0 retries
[  151.420413] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(4) after 0 retries
[  151.420948] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.421121] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(6) after 0 retries
[  151.421279] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(4) after 0 retries
[  151.421814] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.421987] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(6) after 0 retries
[  151.422268] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.422282] [drm:intel_dp_set_link_train [i915]] *ERROR* ret=1 len=1
[  151.422634] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D0
[  151.434403] snd_hda_intel 0000:00:1f.3: PME# disabled
[  151.447335] snd_hda_intel 0000:00:1f.3: PME# enabled
[  151.459283] i2c i2c-2: NAK from device addr 0x50 msg #0
[  151.460351] snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
[  151.488549] i2c i2c-2: NAK from device addr 0x50 msg #0
[  151.492329] i2c i2c-1: NAK from device addr 0x50 msg #0
[  151.496143] i2c i2c-3: NAK from device addr 0x50 msg #0
[  151.500188] i2c i2c-2: NAK from device addr 0x50 msg #0
[  151.504014] i2c i2c-1: NAK from device addr 0x50 msg #0
[  151.507808] i2c i2c-3: NAK from device addr 0x50 msg #0
[  151.620812] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.621097] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(15) after 0 retries
[  151.621265] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.621468] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.621632] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.621873] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(12) after 0 retries
[  151.622022] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.622172] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.628561] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.628726] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.628901] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.629110] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(6) after 0 retries
[  151.632209] i2c i2c-2: NAK from device addr 0x50 msg #0
[  151.638502] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.638779] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(15) after 0 retries
[  151.638933] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.639086] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.639242] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.639497] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(12) after 0 retries
[  151.639664] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.639826] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.639988] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.640183] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(6) after 0 retries
[  151.646707] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.646863] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.650080] i2c i2c-2: NAK from device addr 0x50 msg #0
[  151.653663] i2c i2c-1: NAK from device addr 0x50 msg #0
[  151.657264] i2c i2c-3: NAK from device addr 0x50 msg #0
[  151.658088] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.658377] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(15) after 0 retries
[  151.658542] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.658704] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.658860] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.659104] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(12) after 0 retries
[  151.659258] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.659439] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.659626] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.659830] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(6) after 0 retries
[  151.666278] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.666455] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 0 retries
[  151.670324] i2c i2c-2: NAK from device addr 0x50 msg #0
[  151.673941] i2c i2c-1: NAK from device addr 0x50 msg #0
[  151.677536] i2c i2c-3: NAK from device addr 0x50 msg #0
[  186.356944] gpg-agent[1081]: handler 0x7b7cdd4c7700 for fd 10 started
[  186.406135] gpg-agent[1081]: handler 0x7b7cdd4c7700 for fd 10 terminated
[  235.049835] user[2041]: okiedo


887 retries instead of just 32

however, if I use 3200 retries, there's a long(21sec?) blank screen at system startup due to something else timing out(also seen in attached dmesg in Comment 20):
[   11.444603] e1000e: net0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   32.961810] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries, giving up. First error: -110
[   32.962048] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(-110) after 3200 retries
[   32.962268] [drm] failed to retrieve link info, disabling eDP
[   32.965708] device: 'i915': device_add
[   32.965904] bus: 'event_source': add device i915
[   32.966060] PM: Adding info for event_source:i915


the "patch" I used was:

diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
index b59c87daa4f7..e69b2198af89 100644
--- a/drivers/gpu/drm/i915/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
@@ -87,6 +87,7 @@ intel_dp_set_link_train(struct intel_dp *intel_dp,
 	ret = drm_dp_dpcd_write(&intel_dp->aux, DP_TRAINING_PATTERN_SET,
 				buf, len);
 
+  DRM_ERROR("ret=%d len=%d\n", ret, len);
 	return ret == len;
 }
 
diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
index 54a6414c5d96..a6baf92dba8f 100644
--- a/drivers/gpu/drm/drm_dp_helper.c
+++ b/drivers/gpu/drm/drm_dp_helper.c
@@ -234,7 +234,7 @@ static int drm_dp_dpcd_access(struct drm_dp_aux *aux, u8 request,
 	 * aux i2c transactions but real world devices this wasn't
 	 * sufficient, bump to 32 which makes Dell 4k monitors happier.
 	 */
-	for (retry = 0; retry < 32; retry++) {
+	for (retry = 0; retry < 3200; retry++) {
 		if (ret != 0 && ret != -ETIMEDOUT) {
 			usleep_range(AUX_RETRY_INTERVAL,
 				     AUX_RETRY_INTERVAL + 100);
@@ -262,10 +262,11 @@ static int drm_dp_dpcd_access(struct drm_dp_aux *aux, u8 request,
 			err = ret;
 	}
 
-	DRM_DEBUG_KMS("Too many retries, giving up. First error: %d\n", err);
+	DRM_ERROR("Too many retries, giving up. First error: %d\n", err);
 	ret = err;
 
 unlock:
+	DRM_ERROR("Success(%d) after %u retries\n", ret, retry);
 	mutex_unlock(&aux->hw_mutex);
 	return ret;
 }
Comment 23 howaboutsynergy 2019-06-21 04:48:01 UTC
I guess this will vary...

[ 1410.660548] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 1189 retries

[ 1495.555416] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 1481 retries

[ 1533.515404] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 1381 retries

[ 1597.437236] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Success(1) after 965 retries

but if I don't wake up the software(or should I say video card?) by pressing a key before I turn on the monitor, the success is after 0 retries!
Comment 24 howaboutsynergy 2019-06-21 11:41:59 UTC
actually the variation is strictly due to the time I spent until I turned on the monitor! if I don't turn it on, it will still -110 

the only problem is that it keeps retrying for about 220 seconds(while the monitor is still off):
[  117.970253] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* drm_dp_dpcd_access starting, stack:
...
[  134.801604] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  151.638435] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  168.473882] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  185.388476] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  202.299276] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  219.451539] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  236.282573] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  253.066295] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  269.861933] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  286.623649] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  303.417173] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  320.280230] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
...
[  337.188082] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Too many retries(2000), giving up. First error: -110
[  337.188086] [drm:drm_dp_dpcd_access [drm_kms_helper]] *ERROR* Failed ret='-110' after 2000 retries
...
[  337.188284] i915 Wakeref count: 62
...

looking at the caller stack, it's for different callers, but it gives up after 13 such attempts, apparently (unless that's the time the video card went to sleep again, since I went away from the computer  - monitor was still turned off)
Comment 25 Lakshmi 2019-06-26 11:14:51 UTC
Apart from from the log messages, is there any other impact?
Have you verified the issue with drmtip?
Comment 26 howaboutsynergy 2019-06-26 19:26:01 UTC
> Apart from from the log messages, is there any other impact?

Yes, xrandr will report(until reboot, apparently) that link-status is Bad (previously, was Good) as soon as this issue happens once in the current session/boot.

and file contents of /home/user/.config/xfce4/xfconf/xfce-perchannel-xml/displays.xml
change from:
-      <property name="Active" type="bool" value="true"/>
to
+      <property name="Active" type="bool" value="false"/>

in context:
    <property name="DP-1" type="string" value="LG 27&quot;">
      <property name="Active" type="bool" value="false"/>                                                                       
      <property name="EDID" type="string" value="0fd458f22dac5decf2a8543c7190b329e7cf113c"/>
      <property name="Resolution" type="string" value="3840x2160"/>
      <property name="RefreshRate" type="double" value="59.996625"/>
      <property name="Rotation" type="int" value="0"/>
      <property name="Reflection" type="string" value="0"/>
      <property name="Primary" type="bool" value="true"/>
      <property name="Position" type="empty">
        <property name="X" type="int" value="0"/>
        <property name="Y" type="int" value="0"/>
      </property>
    </property>


Here's full xrandr output for the aforementioned `link-status: Bad`:
$ xrandr --verbose
Screen 0: minimum 320 x 200, current 3840 x 2160, maximum 8192 x 8192
DP-1 connected 3840x2160+0+0 (0x47) normal (normal left inverted right x axis y axis) 600mm x 340mm
	Identifier: 0x42
	Timestamp:  372563
	Subpixel:   unknown
	Gamma:      1.0:1.0:1.0
	Brightness: 1.0
	Clones:     HDMI-1
	CRTC:       0
	CRTCs:      0 1 2
	Transform:  1.000000 0.000000 0.000000
	            0.000000 1.000000 0.000000
	            0.000000 0.000000 1.000000
	           filter: 
	EDID: 
		00ffffffffffff001e6d095b1f2c0400
		011c0104b53c22789e3035a7554ea326
		0f50542108007140818081c0a9c0d1c0
		8100010101014dd000a0f0703e803020
		650c58542100001a286800a0f0703e80
		0890650c58542100001a000000fd0038
		3d1e8738000a202020202020000000fc
		004c4720556c7472612048440a20014f
		02031171449004030123090707830100
		00023a801871382d40582c4500585421
		00001e565e00a0a0a029503020350058
		542100001a0000000000000000000000
		00000000000000000000000000000000
		00000000000000000000000000000000
		00000000000000000000000000000000
		000000000000000000000000000000c8
	Content Protection: Undesired 
		supported: Undesired, Desired, Enabled
	max bpc: 12 
		range: (6, 12)
	Broadcast RGB: Automatic 
		supported: Automatic, Full, Limited 16:235
	audio: auto 
		supported: force-dvi, off, auto, on
	link-status: Bad 
		supported: Good, Bad
	CONNECTOR_ID: 85 
		supported: 85
	non-desktop: 0 
		range: (0, 1)
  3840x2160 (0x47) 533.250MHz +HSync -VSync *current +preferred
        h: width  3840 start 3888 end 3920 total 4000 skew    0 clock 133.31KHz
        v: height 2160 start 2214 end 2219 total 2222           clock  60.00Hz
  3840x2160 (0x48) 266.640MHz +HSync -VSync
        h: width  3840 start 3848 end 3992 total 4000 skew    0 clock  66.66KHz
        v: height 2160 start 2214 end 2219 total 2222           clock  30.00Hz
  2560x1440 (0x49) 241.500MHz +HSync -VSync
        h: width  2560 start 2608 end 2640 total 2720 skew    0 clock  88.79KHz
        v: height 1440 start 1443 end 1448 total 1481           clock  59.95Hz
  1920x1080 (0x4a) 148.500MHz +HSync +VSync
        h: width  1920 start 2008 end 2052 total 2200 skew    0 clock  67.50KHz
        v: height 1080 start 1084 end 1089 total 1125           clock  60.00Hz
  1920x1080 (0x4b) 148.500MHz +HSync +VSync
        h: width  1920 start 2008 end 2052 total 2200 skew    0 clock  67.50KHz
        v: height 1080 start 1084 end 1089 total 1125           clock  60.00Hz
  1920x1080 (0x4c) 148.352MHz +HSync +VSync
        h: width  1920 start 2008 end 2052 total 2200 skew    0 clock  67.43KHz
        v: height 1080 start 1084 end 1089 total 1125           clock  59.94Hz
  1600x900 (0x4d) 108.000MHz +HSync +VSync
        h: width  1600 start 1624 end 1704 total 1800 skew    0 clock  60.00KHz
        v: height  900 start  901 end  904 total 1000           clock  60.00Hz
  1280x1024 (0x4e) 108.000MHz +HSync +VSync
        h: width  1280 start 1328 end 1440 total 1688 skew    0 clock  63.98KHz
        v: height 1024 start 1025 end 1028 total 1066           clock  60.02Hz
  1280x800 (0x4f) 83.500MHz -HSync +VSync
        h: width  1280 start 1352 end 1480 total 1680 skew    0 clock  49.70KHz
        v: height  800 start  803 end  809 total  831           clock  59.81Hz
  1152x864 (0x50) 81.579MHz -HSync +VSync
        h: width  1152 start 1216 end 1336 total 1520 skew    0 clock  53.67KHz
        v: height  864 start  865 end  868 total  895           clock  59.97Hz
  1280x720 (0x51) 74.250MHz +HSync +VSync
        h: width  1280 start 1390 end 1430 total 1650 skew    0 clock  45.00KHz
        v: height  720 start  725 end  730 total  750           clock  60.00Hz
  1280x720 (0x52) 74.250MHz +HSync +VSync
        h: width  1280 start 1390 end 1430 total 1650 skew    0 clock  45.00KHz
        v: height  720 start  725 end  730 total  750           clock  60.00Hz
  1280x720 (0x53) 74.176MHz +HSync +VSync
        h: width  1280 start 1390 end 1430 total 1650 skew    0 clock  44.96KHz
        v: height  720 start  725 end  730 total  750           clock  59.94Hz
  1024x768 (0x54) 65.000MHz -HSync -VSync
        h: width  1024 start 1048 end 1184 total 1344 skew    0 clock  48.36KHz
        v: height  768 start  771 end  777 total  806           clock  60.00Hz
  800x600 (0x55) 40.000MHz +HSync +VSync
        h: width   800 start  840 end  968 total 1056 skew    0 clock  37.88KHz
        v: height  600 start  601 end  605 total  628           clock  60.32Hz
  720x480 (0x56) 27.027MHz -HSync -VSync
        h: width   720 start  736 end  798 total  858 skew    0 clock  31.50KHz
        v: height  480 start  489 end  495 total  525           clock  60.00Hz
  720x480 (0x57) 27.000MHz -HSync -VSync
        h: width   720 start  736 end  798 total  858 skew    0 clock  31.47KHz
        v: height  480 start  489 end  495 total  525           clock  59.94Hz
  640x480 (0x58) 25.200MHz -HSync -VSync
        h: width   640 start  656 end  752 total  800 skew    0 clock  31.50KHz
        v: height  480 start  490 end  492 total  525           clock  60.00Hz
  640x480 (0x59) 25.175MHz -HSync -VSync
        h: width   640 start  656 end  752 total  800 skew    0 clock  31.47KHz
        v: height  480 start  490 end  492 total  525           clock  59.94Hz
  640x480 (0x5a) 25.175MHz -HSync -VSync
        h: width   640 start  656 end  752 total  800 skew    0 clock  31.47KHz
        v: height  480 start  490 end  492 total  525           clock  59.94Hz
HDMI-1 disconnected (normal left inverted right x axis y axis)
	Identifier: 0x43
	Timestamp:  372563
	Subpixel:   unknown
	Clones:     DP-1
	CRTCs:      0 1 2
	Transform:  1.000000 0.000000 0.000000
	            0.000000 1.000000 0.000000
	            0.000000 0.000000 1.000000
	           filter: 
	Content Protection: Undesired 
		supported: Undesired, Desired, Enabled
	max bpc: 12 
		range: (8, 12)
	content type: No Data 
		supported: No Data, Graphics, Photo, Cinema, Game
	aspect ratio: Automatic 
		supported: Automatic, 4:3, 16:9
	Broadcast RGB: Automatic 
		supported: Automatic, Full, Limited 16:235
	audio: auto 
		supported: force-dvi, off, auto, on
	link-status: Good 
		supported: Good, Bad
	CONNECTOR_ID: 93 
		supported: 93
	non-desktop: 0 
		range: (0, 1)
HDMI-2 disconnected (normal left inverted right x axis y axis)
	Identifier: 0x44
	Timestamp:  372563
	Subpixel:   unknown
	Clones:    
	CRTCs:      0 1 2
	Transform:  1.000000 0.000000 0.000000
	            0.000000 1.000000 0.000000
	            0.000000 0.000000 1.000000
	           filter: 
	Content Protection: Undesired 
		supported: Undesired, Desired, Enabled
	max bpc: 12 
		range: (8, 12)
	content type: No Data 
		supported: No Data, Graphics, Photo, Cinema, Game
	aspect ratio: Automatic 
		supported: Automatic, 4:3, 16:9
	Broadcast RGB: Automatic 
		supported: Automatic, Full, Limited 16:235
	audio: auto 
		supported: force-dvi, off, auto, on
	link-status: Good 
		supported: Good, Bad
	CONNECTOR_ID: 99 
		supported: 99
	non-desktop: 0 
		range: (0, 1)
HDMI-3 disconnected (normal left inverted right x axis y axis)
	Identifier: 0x45
	Timestamp:  372563
	Subpixel:   unknown
	Clones:    
	CRTCs:      0 1 2
	Transform:  1.000000 0.000000 0.000000
	            0.000000 1.000000 0.000000
	            0.000000 0.000000 1.000000
	           filter: 
	Content Protection: Undesired 
		supported: Undesired, Desired, Enabled
	max bpc: 12 
		range: (8, 12)
	content type: No Data 
		supported: No Data, Graphics, Photo, Cinema, Game
	aspect ratio: Automatic 
		supported: Automatic, 4:3, 16:9
	Broadcast RGB: Automatic 
		supported: Automatic, Full, Limited 16:235
	audio: auto 
		supported: force-dvi, off, auto, on
	link-status: Good 
		supported: Good, Bad
	CONNECTOR_ID: 103 
		supported: 103
	non-desktop: 0 
		range: (0, 1)


I'm not sure what other side-effects these cause.

> Have you verified the issue with drmtip?

I haven't tried drmtip, but I'm running most recent kernel stable 5.1.15-gf0fae702de30 (released 40 hours ago)
If drmtip would've fixed it, I think they would've mentioned it here, no?

Thanks!
Comment 27 howaboutsynergy 2019-06-26 19:36:20 UTC
looks like full reboot isn't necessary but instead restarting X twice will clear both changes: first X restart will clear the xrandr Bad link status, and the second X restart will update/clear the change in ~/.config/xfce4/xfconf/xfce-perchannel-xml/displays.xml (it becomes `true` again)
Comment 28 Lakshmi 2019-06-27 07:02:14 UTC
> 
> > Have you verified the issue with drmtip?
> 
> I haven't tried drmtip, but I'm running most recent kernel stable
> 5.1.15-gf0fae702de30 (released 40 hours ago)
> If drmtip would've fixed it, I think they would've mentioned it here, no?

Drmtip (https://cgit.freedesktop.org/drm-tip) is a separate branch where you will find latest changes. It will take some time to get these changes to stable kernel.

Logs from drmtip will help from investigation point of view. Also, we need to know if this issue is same as the original bug comment 1, even though the error is same this could be a different issue.

Anyways logs from drmtip would be helpful.
Comment 29 howaboutsynergy 2019-06-27 09:56:17 UTC
Created attachment 144653 [details]
dmesg with drm.debug-0x1e on drm-tip (only within X)

Used this drm-tip:
commit cc7ffe2a301509ba233652ec17d0f74c18b5c8a9 (HEAD -> drm-tip, origin/drm-tip, origin/HEAD)
Date:   Thu Jun 27 10:28:10 2019 +0200

    drm-tip: 2019y-06m-27d-08h-26m-31s UTC integration manifest

Had the following changes in `drivers/gpu/drm/` so that hopefully dumping the stack each time -110 was hit(1+13 times) is helpful to see in the dmesg, however this does increase the time spent(by about 4-5 seconds total), so let me know if you want me to undo this and repost one clean-ish without it.

```patch
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index b9362b4f6353..c5f7f4c37565 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -76,7 +76,7 @@ config DRM_KMS_FB_HELPER
 	bool
 	depends on DRM_KMS_HELPER
 	select FB
-	select FRAMEBUFFER_CONSOLE if !EXPERT
+	select FRAMEBUFFER_CONSOLE
 	select FRAMEBUFFER_CONSOLE_DETECT_PRIMARY if FRAMEBUFFER_CONSOLE
 	select FB_SYS_FOPS
 	select FB_SYS_FILLRECT
diff --git a/drivers/gpu/drm/drm_dp_helper.c b/drivers/gpu/drm/drm_dp_helper.c
index 0b994d083a89..92b2199af2ad 100644
--- a/drivers/gpu/drm/drm_dp_helper.c
+++ b/drivers/gpu/drm/drm_dp_helper.c
@@ -264,10 +264,12 @@ static int drm_dp_dpcd_access(struct drm_dp_aux *aux, u8 request,
 			err = ret;
 	}
 
-	DRM_DEBUG_KMS("Too many retries, giving up. First error: %d\n", err);
+	DRM_ERROR("Too many retries, giving up. First error: %d\n", err);
+  dump_stack();
 	ret = err;
 
 unlock:
+	DRM_ERROR("Ret '%d' after %u retries\n", ret, retry);
 	mutex_unlock(&aux->hw_mutex);
 	return ret;
 }
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 1984e5c54d58..60004f8ae03a 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -2379,7 +2379,7 @@ EXPORT_SYMBOL(drm_fbdev_generic_setup);
  */
 int __init drm_fb_helper_modinit(void)
 {
-#if defined(CONFIG_FRAMEBUFFER_CONSOLE_MODULE) && !defined(CONFIG_EXPERT)
+#if defined(CONFIG_FRAMEBUFFER_CONSOLE_MODULE) && ( !defined(CONFIG_EXPERT) || defined(CONFIG_DRM_MODULE) || defined(CONFIG_DRM) )
 	const char name[] = "fbcon";
 	struct module *fbcon;
 
diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.c b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
index 9b1fccea966b..1242cf9e9828 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
@@ -89,6 +89,7 @@ intel_dp_set_link_train(struct intel_dp *intel_dp,
 	ret = drm_dp_dpcd_write(&intel_dp->aux, DP_TRAINING_PATTERN_SET,
 				buf, len);
 
+  DRM_ERROR("ret=%d len=%d\n", ret, len);
 	return ret == len;
 }
 
```

For full set of changes(including actual .config file used) see: https://gist.github.com/howaboutsynergy/9b6e890ec15592a55a0c07b5db234ce6
Comment 30 howaboutsynergy 2019-06-27 10:07:42 UTC
Created attachment 144654 [details]
cleaner dmesg with drm.debug=0x1e on drm-tip (only within X)

this one undid the patches to the 2 files, ie.

git checkout drivers/gpu/drm/i915/display/intel_dp_link_training.c drivers/gpu/drm/drm_dp_helper.c

so there's no stack dumping or needless logging, thus less time is spent in those functions

the log is cleaner this way, but if you want to see the caller stack for when -110 happens just look at prev. comment's attachment :D

What else can I do?
Thanks!
Comment 31 howaboutsynergy 2019-06-27 10:18:44 UTC
Created attachment 144655 [details]
dmesg with drm.debug=0x1e and i915.disable_power_well=0 (only within X)

`only within X` means that I reproduce the issue only within X(ie. `startx` command)
X.Org X Server 1.20.5
X Protocol Version 11, Revision 0

this attachment is just like the previous one but I've added kernel cmdline i915.disable_power_well=0  (which had no effect on the issue)
Comment 32 Manasi 2019-07-08 23:31:54 UTC
The Xserver logs report the link status as BAD because the kernel sends a uevent reporting this as soon as the link training fails or its unable to link train.

The aux timeouts that we have seen previously in case of eDP panels could be because the panel is not really ON when we try to link train and that can happen because the panel power cycle delay is not long enough.

Could you please try the attached patch that increases the panel power cycle delay to see if that fixes the issue and also send a dmesg log with drm.debug = 0xe?

Manasi
Comment 33 Manasi 2019-07-08 23:32:34 UTC
Created attachment 144729 [details] [review]
Increase panel power cycle delay test
Comment 34 howaboutsynergy 2019-07-09 13:14:09 UTC
Created attachment 144741 [details]
dmesg with patch from comment 33 and drm.debug=0xe

tl;dr: the issue persists even with that patch.

I tried these steps inside `startx`, twice:
turn off monitor, 
then ran `monoff` to put card(?) to sleep, 
then typed something to wake up card(while monitor is still off),
then turned on the monitor.
(there's some intentional delay(few seconds) between each step)

The second time I tried it, I failed to properly execute `logger` to add my own message(s) to `dmesg`, as follows:

```
user@i87k 2019/07/09 14:42:04 -bash5.0.7 t:4 j:0 d:3 pp:1149 p:1269 ut217
!36219 7 0  5.2.0-g0ecfebd2b524 #19 SMP Tue Jul 9 14:35:54 CEST 2019
/home/user 
$ logger "ok, monior is off, about to execute monoff"
-----------
user@i87k 2019/07/09 14:42:19 -bash5.0.7 t:4 j:0 d:3 pp:1149 p:1269 ut232
!36220 8 0  5.2.0-g0ecfebd2b524 #19 SMP Tue Jul 9 14:35:54 CEST 2019
/home/user 
$ monoff
setterm: terminal xterm-256color does not support --blank
-----------
user@i87k 2019/07/09 14:42:27 -bash5.0.7 t:4 j:0 d:3 pp:1149 p:1269 ut240
!36221 9 0  5.2.0-g0ecfebd2b524 #19 SMP Tue Jul 9 14:35:54 CEST 2019
/home/user 
$ ld someething to wake up the card"
> logger "monitor is still off, btw"
> logger "about to tun on the monitor"
> ^C
-----------
user@i87k 2019/07/09 14:43:31 -bash5.0.7 t:4 j:0 d:3 pp:1149 p:1269 ut303
!36222 9 0  5.2.0-g0ecfebd2b524 #19 SMP Tue Jul 9 14:35:54 CEST 2019
/home/user 
$
```

in the above, the command:
`$ ld someething to wake up the card"`
is supposed to be:
`$ logger "typed someething to wake up the card"`
but somehow those keys(`ogger "type`) got eaten up, probably due to drm doing its thing after my typing woke up the card

The 'utNUM' at the end of my prompt is uptime(seconds), it might help correlate the events with the ones in dmesg.
Comment 35 howaboutsynergy 2019-07-09 13:24:41 UTC
(In reply to Manasi from comment #32)
the panel is definitely not on when drm tries to link train, ie. the monitor is turned off. Given this, I wonder if this is even an issue anymore?

In other words, since the monitor is off when I type a few keys on the keyboard to wake up the card(which it thinks it already put the monitor to sleep), then drm trying link-training with a monitor that it turned off(from the monitor's button that is, not unplugged from power though) would always fail, right? But as soon as monitor is turned on, then it re-linktrains and works fine. So, is it even an issue? I guess Comment 25 was right, it's just the log messages saying when link training failed. If it weren't for the other two side-effects (see Comment 26), it wouldn't really be an issue and I guess there's nothing you can do, unless the kernel does not send a uevent reporting that the link training succeeded, but if it already does, well then I'm stuck with those side effects.
Comment 36 Jyoti Yadav 2019-07-12 05:27:36 UTC
is issue recently reproduced on latest drm tip? one similar issue on ICL -lp got resolved with 5.2.0 kernel.
Comment 37 Jyoti Yadav 2019-08-09 05:50:58 UTC
any update here?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.