Bug 46145 - [SNB]missing IRQ in kernel drm-intel-next-queued branch
Summary: [SNB]missing IRQ in kernel drm-intel-next-queued branch
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high major
Assignee: Ben Widawsky
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-15 23:22 UTC by lu hua
Modified: 2017-10-06 14:51 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
reinstate the hwstam magic (1.13 KB, patch)
2012-02-17 02:10 UTC, Daniel Vetter
no flags Details | Splinter Review
enable CS reg readback w/a for snb (634 bytes, patch)
2012-03-23 02:33 UTC, Daniel Vetter
no flags Details | Splinter Review

Description lu hua 2012-02-15 23:22:06 UTC
System Environment:
--------------------------
Arch:                 i386
Platform:             Sandybridge
Libdrm:              (master)2.4.31-4-g9b3ad51ae5fd9654df8ef75de845a519015150bb
Mesa:                (master)78734e375a0e3ea87abd6d5b2f85946e78e96015
Xserver:             (master)xorg-server-1.11.99.903-1-gd53235af85d50774c68347720ce132daf9a5bc49
Libva_intel_driver:  (vaapi-ext)d0cf73ad1dc66a8fa5911acb837c08604cc51940
Kernel:        (drm-intel-next-queued)7c26e5c6edaec70f12984f7a3020864cc21e6fec

Bug detailed description:
-----------------------------
It occurs on Sandybridge with kernel drm-intel-next-queued branch while running some piglit glean cases(glean_clipFlat,glean_pointAtten, glean_polygonOffset). It does not happen on Ivybridge. It does not occur with kernel 3.2.4 on Sandybridge. 

Reproduce steps:
----------------------------
1. start X
2. start gnome-session
3. run : ./glean -r /tmp/results/glean/pointAtten -o -v -v -v -t +pointAtten
Comment 1 Chris Wilson 2012-02-16 01:09:45 UTC
So I guess this is just the missing merge from -fixes?
Comment 2 Daniel Vetter 2012-02-16 01:39:49 UTC
The -queued sha1 you've tested on does not yet contain the voodoo stuff for snb. Please retest with an updated -queued and ensure that you have

commit 99ffa1629d737295e569267cf5940758139f9ddb
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jan 25 14:04:00 2012 +0100

    drm/i915: enable forcewake voodoo also for gen6

in your kernel.
Comment 3 lu hua 2012-02-17 01:23:52 UTC
Tested with the latest -queued branch kernel(de67cba65944f26c0f147035bd62e),
Missing IRQ still hanppens.
Comment 4 Daniel Vetter 2012-02-17 02:10:12 UTC
Created attachment 57200 [details] [review]
reinstate the hwstam magic

Please try the attached patch.

Also, this 'missed IRQ' issue showed up before I've applied the voodoo patch for snb (which reverted the hwstam w/a), so something else changed. Can you please bisect where this issue got introduced so that we have a chance to find out what went wrong?

Please do the bisect even when the attached patch works, it's _very_ important for us to get a handle on these 'missed IRQ' issues.
Comment 5 lu hua 2012-02-20 18:33:06 UTC
Put the patch with the attachment 57200 [details] [review]. This issue does not happen.
Bisect is in progress.
Select good commint d8e70a254d8f2da141006e496a51502b79115e80, and bad commint 7c26e5c6edaec70f12984f7a3020864cc21e6fec.
Comment 6 Daniel Vetter 2012-02-21 01:11:24 UTC
Another patch to try:

https://bugs.freedesktop.org/attachment.cgi?id=57315

Please also check whether this patch works to get rid of this missed IRQ issue.
Comment 7 lu hua 2012-02-22 21:32:50 UTC
This issue does not happen with that you provided new patch.
Comment 8 lu hua 2012-02-27 00:07:24 UTC
Bisect shows 8a8ed1f5143b3df312e436ab15290e4a7ca6a559 is the first bad commit.
commit 8a8ed1f5143b3df312e436ab15290e4a7ca6a559
Author:     Yufeng Shen <miletus@chromium.org>
AuthorDate: Mon Feb 13 17:36:54 2012 -0500
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Tue Feb 14 10:39:53 2012 +0100

    drm/i915: Fix race condition in accessing GMBUS

    GMBUS has several ports and each has it's own corresponding
    I2C adpater. When multiple I2C adapters call gmbus_xfer() at
    the same time there is a race condition in using the underlying
    GMBUS controller. Fixing this by adding a mutex lock when calling
    gmbus_xfer().

    v2: Moved gmbus_mutex below intel_gmbus and added comments.
    Rebased to drm-intel-next-queued.

    Signed-off-by: Yufeng Shen <miletus@chromium.org>
    [danvet: Shortened the gmbus_mutex comment a bit and add the patch
    revision comment to the commit message.]
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 9 Daniel Vetter 2012-02-27 01:09:24 UTC
This bisect result is pretty astonishing ... Can you confirm this by reverting the offending commit on top of latest -testing?
Comment 10 lu hua 2012-02-28 01:46:42 UTC
This issue does not hanppen on the latest -testing branch, also not happen while revert commit 8a8ed1f5143b3df312e436ab15290e4a7ca6a559.

Retest this issue on the branch(commit 8a8ed1f5143b3df312e436ab15290e4a7ca6a559), can not duplicate it. So the bisect result needs reconfirmation.
Comment 11 Daniel Vetter 2012-02-28 02:54:51 UTC
Hm, this sounds like a Heisenbug that shows up and disappears again. Closing as 'worksforme' for now, please reopen if this shows up again somewhere.
Comment 12 lu hua 2012-03-12 02:38:50 UTC
Enable compiz, This issue happens again.
Comment 13 Guang Yang 2012-03-14 00:39:40 UTC
System Environment:
--------------------------
Platform:             Sandybridge
Kernel: (drm-intel-next-queued)fa37d39e4c6622d80bd8061d600701bcea1d6870

Bug detailed description:
-----------------------------
 running gem_dummy_reloc_loop of the Intel-gpu-tools,the dmesg shows like this:

[  111.853298] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 7769971, at 7769971], missed IRQ?
[  113.692050] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 7823757, at 7823757], missed IRQ?
[  118.119401] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 8285985, at 8285985], missed IRQ?
Comment 14 Daniel Vetter 2012-03-18 09:34:37 UTC
I hate this bug :( Can you please check whether the patch at:

https://bugs.freedesktop.org/attachment.cgi?id=57315

still works around this issues?
Comment 15 Ben Widawsky 2012-03-18 15:30:10 UTC
I ran into this bug with my HW context work, and the patch from https://bugs.freedesktop.org/attachment.cgi?id=57315 seems to make the problem go away.
Comment 16 lu hua 2012-03-22 22:21:05 UTC
Running on the lastest queued branch(121d527a32) and patch  https://bugs.freedesktop.org/attachment.cgi?id=57315.
Disable compiz,Issue doesn't happen.
Enable compiz, Issue still exists.
dmesg shows:
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 28554476, at 28554476], missed IRQ?
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 31826421, at 31826421], missed IRQ?
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 32373088, at 32373088], missed IRQ?
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 35318264, at 35318264], missed IRQ?
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 35418763, at 35418763], missed IRQ?
Comment 17 Daniel Vetter 2012-03-23 02:33:49 UTC
Created attachment 58909 [details] [review]
enable CS reg readback w/a for snb

Please apply this patch and check whether the issue goes away (in both configurations, i.e. compiz enabled and disabled).
Comment 18 lu hua 2012-03-26 20:37:12 UTC
Running on the lastest queued branch(1d83f44) and patch (attachment 58909 [details] [review])
Disable compiz,Issue doesn't happen.
Enable compiz, Issue still exists.
dmesg shows:
[  263.391270] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 762923, at 762923], missed IRQ?
[  390.382665] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 5576121, at 5576121], missed IRQ?
Comment 19 Daniel Vetter 2012-03-27 00:29:40 UTC
Hm, I have a gut feeling that this is just an issue with the blt ring. Ben, do you still have patches around to use pipe notify irq and MI_FLUSH_DW writes for the seqno+irq generation on blt?
Comment 20 Ben Widawsky 2012-03-27 07:28:38 UTC
Yes. My branch here has both workarounds.

It's missing the readback though, and the last revert commit should be removed.

http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=irq_fix
Comment 21 Florian Mickler 2012-04-16 14:34:51 UTC
A patch referencing this bug report has been merged in Linux v3.4-rc2:

commit 1c7eaac737e4cca24703531ebcb566afc3ed285f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Mar 27 09:31:24 2012 +0200

    drm/i915: apply CS reg readback trick against missed IRQ on snb
Comment 22 lu hua 2012-04-24 02:26:58 UTC
Verified. It doesn't happen on drm-intel-next-queued with commit b98e5240b362e702355ffedba05aeb589dfbcbe2.
Comment 23 lu hua 2012-04-26 02:05:54 UTC
Reopen, following piglit cases have this issue on -queued kernel with commit 74d2c584c37c4fd7ab0f40d2fb546559992c4f9b.
glx_GLX_ARB_create_context_default_minor_version
shaders_glsl-vs-user-varying-ff
spec_OpenGL_1.1_texwrap-2D-GL_LUMINANCE4
spec_glsl-1.10_compiler_special-characters_digraph-open-bracket.frag
spec_glsl-1.10_execution_built-in-functions_fs-op-mult-ivec2-ivec2
spec_glsl-1.20_compiler_built-in-functions_op-mult-ivec3-int.vert
Comment 24 Daniel Vetter 2012-04-26 03:26:37 UTC
Can you retest please retest with latest -queued, specifically this patch:

commit 63c02a10149080afc8fd616b5e1aa6a1e72352fb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 24 21:48:47 2012 +0100

    drm/i915: Use a global lock for modifying global irq flags
Comment 25 lu hua 2012-04-27 20:30:16 UTC
It doesn't happen on -queued kernel commit b57aa4007a558be50955f9b58f5da98fcb78aa85.
Comment 26 lu hua 2012-04-27 20:34:30 UTC
Verified.It fixed on -queued kernel commit b57aa4007a558.
Comment 27 Elizabeth 2017-10-06 14:51:02 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.