Bug 64133 - linux-3.9: [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)!
Summary: linux-3.9: [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal time...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-01 19:48 UTC by Martin Mokrejs
Modified: 2016-10-03 09:11 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg 3.9 attempt1 with both "*ERROR* dp aux" and "Wrong MCH_SSKPD" (110.59 KB, text/plain)
2013-05-01 19:48 UTC, Martin Mokrejs
no flags Details
dmesg 3.9 attempt2 without "*ERROR* dp aux" but with "Wrong MCH_SSKPD" (185.85 KB, text/plain)
2013-05-01 19:51 UTC, Martin Mokrejs
no flags Details
dmesg from linux-next-20130501 (after 3.9) (1.23 MB, text/plain)
2013-05-01 19:54 UTC, Martin Mokrejs
no flags Details
fix false timeouts due to higher scheduling latencies (2.10 KB, patch)
2013-05-02 09:19 UTC, Imre Deak
no flags Details | Splinter Review
dmesg-3.9-drm-intel-20130502.txt (120.33 KB, text/plain)
2013-05-02 15:44 UTC, Martin Mokrejs
no flags Details

Description Martin Mokrejs 2013-05-01 19:48:06 UTC
Created attachment 78736 [details]
dmesg 3.9 attempt1 with both "*ERROR* dp aux" and "Wrong MCH_SSKPD"

This bug is present in 3.9 and linux-next-20130501. We communicated about it in private emails with some developers, so I am pasting in here the whole communication.

On Wed, May 1, 2013 at 12:39 AM, Sedat Dilek <sedat.dilek@gmail.com> wrote:
> On Wed, May 1, 2013 at 12:30 AM, Martin Mokrejs
> <mmokrejs@fold.natur.cuni.cz> wrote:
>> Hi Sedat, [+Imre and +Daniel who were in the original thread]
>>
>> Sedat Dilek wrote:
>>> On Tue, Apr 30, 2013 at 11:56 PM, Martin Mokrejs
>>> <mmokrejs@fold.natur.cuni.cz> wrote:
>>>> Hi,
>>>>   I found the above error in my dmesg and found a thread
>>>> https://lkml.org/lkml/2013/2/27/275 . However, I am not sure this
>>>> is really my hardware BIOS bug or a kernel bug. Would you please
>>>> ensure me I should blame Dell for this Dell Vostro 3550 laptop?
>>>> This is a Sandy Bridge-based laptop with no extra graphics chip,
>>>> has only the intel VGA bundled into my core i7 CPU.
>>>> I am attaching a full dmesg from my system.
>>>>   Regarding symptoms ... I think at the time I got this error
>>>> message in dmesg I had my external HDMI1 screen working ... but I
>>>> admit I have in general "random" issues that my external HDMI1 display
>>>> is not detected during bootup and stays blank. xrandr says HDMI1 is
>>>> disconnected I cannot make it to enable HDMI1 for me. Turning on/off
>>>> the LCD itself does not work, also not manually setting its input to
>>>> HDMI (it has also VGA D-SUB input socket). But these issue exist in
>>>> 3.7/3.8 series as well. So maybe this ERROR message which landed in
>>>> 3.9-rc1 is good for me. To date, have the latest BIOS, A11 for this
>>>> model.
>>>>
>>>
>>> IIRC that was fixed with [1] and is included in Linux-v3.9?
>>
>> Seems not. This is vanilla 3.9.
>>
>
> I am on Linux-Next, try that or drm-intel-next?

Hm, random HDMI failures combined with the dp aux stuff being stuck
makes some sense. If drm-intel-nightly or linux-next don't cut it for
you, please file a bug report on bugs.freedesktop.org against DRI ->
DRM(Intel)

>>>
>>> Might be good to CC intel-gfx ML.
>>
>> Yeah, I wanted to ask first ... and due to the size of dmesg it would not
>> likely pass filters anyway ... Please Cc: the list for the sake of internet
>> archives if you find this *relevant*. I just do not know whether this is
>> a false alarm or not. Sorry for me being so suspicious. ;-)
>>
>>>
>>> AFAICS I have seen some /similiar/same) drm/kms/i915 warnings/errors
>>> with v3.9-rc7+ Ubuntu/raring mainline kernels [1].
>>> But I might be wrong...
>>
>> OK, I did another cold boot and this time the ERROR message is gone, but the
>> line with Wrong MCH_SSKPD stays. Below are both snippets of two plain 3.9
>> bootups.
>>
>
> That one I reported, too.
> Daniel was reflecting on it, can't say if he made a decision.
> AFAICS harmless.

Worst case it can cause underruns, which especially on DP can cause
loss of sync and black screen.
-Daniel
Comment 1 Martin Mokrejs 2013-05-01 19:51:00 UTC
Created attachment 78737 [details]
dmesg 3.9 attempt2 without "*ERROR* dp aux" but with "Wrong MCH_SSKPD"

This is slightly earlier email than the one pasted as the *original* report. 

Hi Sedat, [+Imre and +Daniel who were in the original thread]

Sedat Dilek wrote:
> On Tue, Apr 30, 2013 at 11:56 PM, Martin Mokrejs
> <mmokrejs@fold.natur.cuni.cz> wrote:
>> Hi,
>>   I found the above error in my dmesg and found a thread
>> https://lkml.org/lkml/2013/2/27/275 . However, I am not sure this
>> is really my hardware BIOS bug or a kernel bug. Would you please
>> ensure me I should blame Dell for this Dell Vostro 3550 laptop?
>> This is a Sandy Bridge-based laptop with no extra graphics chip,
>> has only the intel VGA bundled into my core i7 CPU.
>> I am attaching a full dmesg from my system.
>>   Regarding symptoms ... I think at the time I got this error
>> message in dmesg I had my external HDMI1 screen working ... but I
>> admit I have in general "random" issues that my external HDMI1 display
>> is not detected during bootup and stays blank. xrandr says HDMI1 is
>> disconnected I cannot make it to enable HDMI1 for me. Turning on/off
>> the LCD itself does not work, also not manually setting its input to
>> HDMI (it has also VGA D-SUB input socket). But these issue exist in
>> 3.7/3.8 series as well. So maybe this ERROR message which landed in
>> 3.9-rc1 is good for me. To date, have the latest BIOS, A11 for this
>> model.
>>
>
> IIRC that was fixed with [1] and is included in Linux-v3.9?

Seems not. This is vanilla 3.9.

>
> Might be good to CC intel-gfx ML.

Yeah, I wanted to ask first ... and due to the size of dmesg it would not
likely pass filters anyway ... Please Cc: the list for the sake of internet
archives if you find this *relevant*. I just do not know whether this is
a false alarm or not. Sorry for me being so suspicious. ;-)

>
> AFAICS I have seen some /similiar/same) drm/kms/i915 warnings/errors
> with v3.9-rc7+ Ubuntu/raring mainline kernels [1].
> But I might be wrong...

OK, I did another cold boot and this time the ERROR message is gone, but the
line with Wrong MCH_SSKPD stays. Below are both snippets of two plain 3.9 bootups.

[   15.923765] [drm] Memory usable by graphics device = 2048M
[   15.923854] i915 0000:00:02.0: setting latency timer to 64
[   15.955225] i915 0000:00:02.0: irq 48 for MSI/MSI-X
[   15.955388] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[   15.955389] [drm] Driver supports precise vblank timestamp query.
[   15.955965] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[   15.977476] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)!
[   15.986015] [drm] Wrong MCH_SSKPD value: 0x16040307
[   15.986017] [drm] This can cause pipe underruns and display issues.
[   15.986018] [drm] Please upgrade your BIOS to fix this.
[   16.059391] fbcon: inteldrmfb (fb0) is primary device
[   17.007481] Console: switching to colour frame buffer device 170x48
[   17.013832] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[   17.013837] i915 0000:00:02.0: registered panic notifier
[   17.069658] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[   17.130649] acpi device:34: registered as cooling_device2
[   17.136419] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   17.137125] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input14
[   17.138272] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0


[   14.268697] [drm] Memory usable by graphics device = 2048M
[   14.268768] i915 0000:00:02.0: setting latency timer to 64
[   14.298145] i915 0000:00:02.0: irq 48 for MSI/MSI-X
[   14.298291] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[   14.298292] [drm] Driver supports precise vblank timestamp query.
[   14.298840] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[   14.326522] [drm] Wrong MCH_SSKPD value: 0x16040307
[   14.326524] [drm] This can cause pipe underruns and display issues.
[   14.326525] [drm] Please upgrade your BIOS to fix this.
[   14.343697] ata8: SATA link down (SStatus 0 SControl 0)
[   14.460403] fbcon: inteldrmfb (fb0) is primary device
[   15.614149] Console: switching to colour frame buffer device 170x48
[   15.620766] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[   15.620771] i915 0000:00:02.0: registered panic notifier
[   15.726872] acpi device:34: registered as cooling_device2
[   15.732434] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   15.733122] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input14
[   15.734012] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[   16.066427] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off


>
> - Sedat -
>
> [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=44498aea293b37af1d463acd9658cdce1ecdf427


Nice, detailed description but it is too far away from my knowledge.
Martin

> [2] http://kernel.ubuntu.com/~kernel-ppa/mainline/
>
>> Thank you for explanation what am I supposed to do. ;-)
>> Martin
>>
>>
>> 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
>>         Subsystem: Dell Device 04b3
>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>>         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>         Latency: 0
>>         Interrupt: pin A routed to IRQ 48
>>         Region 0: Memory at f6800000 (64-bit, non-prefetchable) [size=4M]
>>         Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
>>         Region 4: I/O ports at f000 [size=64]
>>         Expansion ROM at <unassigned> [disabled]
>>         Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
>>                 Address: fee0300c  Data: 4172
>>         Capabilities: [d0] Power Management version 2
>>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>>         Capabilities: [a4] PCI Advanced Features
>>                 AFCap: TP+ FLR+
>>                 AFCtrl: FLR-
>>                 AFStatus: TP-
>>         Kernel driver in use: i915
>>
>
>
Comment 2 Martin Mokrejs 2013-05-01 19:54:28 UTC
Created attachment 78738 [details]
dmesg from linux-next-20130501 (after 3.9)
Comment 3 Imre Deak 2013-05-02 09:19:30 UTC
Created attachment 78771 [details] [review]
fix false timeouts due to higher scheduling latencies

Could you check if the attached patch solves the issue? It applies on top of the http://cgit.freedesktop.org/~danvet/drm-intel/ drm-intel-nightly branch.
Comment 4 Martin Mokrejs 2013-05-02 15:44:43 UTC
Created attachment 78785 [details]
dmesg-3.9-drm-intel-20130502.txt

I tested the 

git clone http://cgit.freedesktop.org/~danvet/drm-intel drm-intel-nightly 

with your patch on top of it and the "*ERROR* dp aux" is gone at the first attempt. The "Wrong MCH_SSKPD" is still in the dmesg but external HDMI1 works (in fb console and in X11). Will reboot few times if that will be stable.
Comment 5 Martin Mokrejs 2013-05-02 17:45:27 UTC
(In reply to comment #4)
> Will reboot few times if that will be stable.

On the second trial same thing: no "*ERROR* dp aux" in dmesg but still "Wrong MCH_SSKPD". Additionally, this time framebuffer console did not turn on HDMI1 and therefore not even in X11 I get anything on the external LCD. But that is probably not related to this bug, it happened sometimes even with 3.7.x kernels. Unplug/replug of the HDMI cable does not help either.
Comment 6 Imre Deak 2013-05-03 05:52:29 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > Will reboot few times if that will be stable.
> 
> On the second trial same thing: no "*ERROR* dp aux" in dmesg but still
> "Wrong MCH_SSKPD". Additionally, this time framebuffer console did not turn
> on HDMI1 and therefore not even in X11 I get anything on the external LCD.
> But that is probably not related to this bug, it happened sometimes even
> with 3.7.x kernels. Unplug/replug of the HDMI cable does not help either.

Ok, thanks for testing this, we'll work on integrating some form of this fix.

Yea, the framebuffer/X startup problem sounds like a separate issue, I'd suggest opening a new bug with the usual Xorg.0.log and dmesg (with drm.debug=0xf kernel command line) combo attached.
Comment 7 Imre Deak 2013-05-16 09:21:37 UTC
*** Bug 64661 has been marked as a duplicate of this bug. ***
Comment 8 Daniel Vetter 2013-05-29 08:37:41 UTC
Ok, all parts of the fix have landed in upstream, at least for the dp_aux timeout bug. Closing this one now.

About the HDMI issue it looks like that one is an independent issue. To bugzilla sanity can you please file a new one?
Comment 9 Jari Tahvanainen 2016-10-03 09:11:06 UTC
Failure Fixed+Verified, closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.