Bug 38862 - [IVB] missed IRQs
[IVB] missed IRQs
Status: VERIFIED FIXED
Product: DRI
Classification: Unclassified
Component: DRM/Intel
unspecified
x86 (IA32) Linux (All)
: high critical
Assigned To: Chris Wilson
fangxun
:
: 38413 41349 43178 43745 (view as bug list)
Depends on:
Blocks: 42991
  Show dependency treegraph
 
Reported: 2011-06-30 21:47 UTC by meng
Modified: 2012-02-05 18:56 UTC (History)
13 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Fallback to polling seqno (13.99 KB, patch)
2011-12-15 10:58 UTC, Chris Wilson
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description meng 2011-06-30 21:47:20 UTC
System Environment:
--------------------------
Platform: IVB
Libdrm:  (master)2.4.26
Mesa:    (7.11)b90c710c6cd8017f59b09d935fbbbe94ada81a12
Xserver: (server-1.10-branch)xorg-server-1.10.2.901-5-g79ef102c3adf7cae8982b05320109d0439e6587c
Xf86_video_intel:(master)2.15.0-152-g18d08e49d270b7a05f14a309759c9315e5ab9679
Cairo:    (master)ea645913ba8739377ee2e2b51480310befc19b76
Libva:    (master)5343740dfec289858cfafda64dd5260179d09d4f
Kernel:   (drm-intel-fixes) f01c22fd59aa10a3738ede20fd4b9b6fd1e2eac3

Bug detailed description:
-------------------------
On gnome-session with compiz,screen stuttered when running 3D games or its demo(urbanterror,openarena)on IVB.And "Hangcheck" in dmesg.When enable semaphores,
it works fine.When cat i915_error_state,no error state collected.
Especially,without compiz,it's OK.

Reproduce steps:
----------------
1. gnome-session(enable compiz)
2. run 3D games or its demo
Comment 1 Chris Wilson 2011-07-01 00:33:01 UTC
Jesse can you do the trivial extension of the SNB HWSTAM workaround and see if prevents IVB from being so miserable?
Comment 2 Chris Wilson 2011-07-01 23:59:02 UTC
The HWSTAM for IVB workaround has gone upstream.
Comment 3 Gordon Jin 2011-07-03 17:46:42 UTC
For record of the commit info:
http://git.kernel.org/?p=linux/kernel/git/keithp/linux-2.6.git;a=commit;h=2b1ecb7337592a7bf0989efac46a5b52daab769e

drm/i915: apply HWSTAM writes to Ivy Bridge as well
Comment 4 meng 2011-07-03 18:15:00 UTC
Test commit 2b1ecb7337592a7bf0989efac46a5b52daab769e,the problem still exists.
Comment 5 Chris Wilson 2011-07-04 08:06:48 UTC
And the dmesg now says?
Comment 6 meng 2011-07-04 16:18:20 UTC
In dmesg:
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 5112, at 5112], missed IRQ?
[drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 6566, at 6566], missed IRQ?
Comment 7 Chris Wilson 2011-07-19 12:42:43 UTC
*** Bug 38413 has been marked as a duplicate of this bug. ***
Comment 8 Chris Wilson 2011-09-06 05:37:10 UTC
Meng, you mention in bug 38863 that the enabling semaphores workarounds the missed IRQ issue? Do you mind confirming that again?
Comment 9 zhao jian 2011-09-14 07:21:01 UTC
(In reply to comment #8)
> Meng, you mention in bug 38863 that the enabling semaphores workarounds the
> missed IRQ issue? Do you mind confirming that again?

Hi Chris, 
I have tried on IvyBridge with mesa master and kernel 3.1-rc4 with 3D demos like openarena and urbanterror, if without semaphore the 3D games will stutter and exited immediately with GPU hang.(with error message like: (EE) intel(0): Detected a hung GPU, disabling acceleration.
intel_do_flush_locked failed: Unknown error 18446744073709551611 and in dmesg the error message are: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung  [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1230 at 1228, next 1231))

If with semaphores set, there will be no GPU hang but still with some error message in dmesg like: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung  [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1394 at 1393, next 1397)  If test with nexuiz, there will be some missing IRQ error in dmesg. ([drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 33267, at 33267], missed IRQ? )
Comment 10 Kenneth Graunke 2011-09-21 10:26:04 UTC
I've done many, many runs of Nexuiz with semaphores enabled on my rev04...with or without Compiz...and I've never seen any missed IRQ messages.  Haven't seen any GPU hangs in a long time, either.
Comment 11 Eugeni Dodonov 2011-09-27 13:28:06 UTC
I could not reproduce GPU hangs with Ken's patch from bug #38862 on IVB either, with or without compositing.

This is with kernel 3.1-rc7 and mesa master at 0527c11d7aa42bd74f4527d7299e3c18f37c4c44.

Could you retry with those version please? If it still hangs for you, perhaps it is a different issue..
Comment 12 zhao jian 2011-09-29 02:10:55 UTC
(In reply to comment #11)
> I could not reproduce GPU hangs with Ken's patch from bug #38862 on IVB either,
> with or without compositing.
> This is with kernel 3.1-rc7 and mesa master at
> 0527c11d7aa42bd74f4527d7299e3c18f37c4c44.
> Could you retry with those version please? If it still hangs for you, perhaps
> it is a different issue..

Yes. This bug is not tracking GPU hang, it tracks that it will have "Missing IRQ" error message in dmesg. And this still existed in the newest code on IvyBridge. For the GPU hang bug, you can refer to the bug #38863 which is marked as fixed now. And I have replied it that it worked well with mesa master branch, but still not verify it because I think it should cherry pick to 7.11 branch. What do you think?
Comment 13 Gordon Jin 2011-09-29 19:34:30 UTC
Let's say this bug was to track screen stuttered with missed irq. The gpu hang and stutter have been fixed on mesa master so the rest work here is cherry-picking to 7.11. (maybe related to bug#38863)
The rest missed irq message issue (but without hurt) is to be tracked in bug#41439.
Comment 14 Gordon Jin 2011-11-01 23:23:07 UTC
Does this still exist on 7.11 branch?
Comment 15 Eugeni Dodonov 2011-11-03 14:04:38 UTC
*** Bug 41349 has been marked as a duplicate of this bug. ***
Comment 16 fangxun 2011-11-03 23:40:33 UTC
This bug still exists on latest 7.11 branch.
Comment 17 fangxun 2011-11-04 02:15:42 UTC
It also exists on mesa master branch.
Comment 18 Ian Romanick 2011-11-07 10:35:57 UTC
After some discussion in the bi-weekly Mesa team meeting today, this is a kernel issue.  The difference between the systems that reproduce the bug and do not reproduce the bug seems to be whether or not semaphores are enabled.  Systems that work fine all have semaphores enabled.

Ben has a kernel patch that may help, but it hasn't not gone upstream yet.
Comment 19 Yi Sun 2011-11-22 00:49:16 UTC
I could reproduced the missed IRQ issue with Intel_gpu_tool test case test/gem_dummy_reloc_loop. 

We reproduced it on a IvyBridge mobile platform, and the case never finish without dmesg or error information on desktop platform. No matter enable semaphores or not, the issue could always be reproduced.
Comment 20 Gordon Jin 2011-11-22 16:29:05 UTC
(In reply to comment #19)
> I could reproduced the missed IRQ issue with Intel_gpu_tool test case
> test/gem_dummy_reloc_loop. 
> We reproduced it on a IvyBridge mobile platform, and the case never finish
> without dmesg or error information on desktop platform. No matter enable
> semaphores or not, the issue could always be reproduced.

sounds like we should track at a separate bug, as this bug is supposed to be closed when semaphores enabled.
Comment 21 Daniel Vetter 2011-11-23 02:16:23 UTC
> --- Comment #20 from Gordon Jin <gordon.jin@intel.com> 2011-11-22 16:29:05 PST ---
> (In reply to comment #19)
> > I could reproduced the missed IRQ issue with Intel_gpu_tool test case
> > test/gem_dummy_reloc_loop. 
> > We reproduced it on a IvyBridge mobile platform, and the case never finish
> > without dmesg or error information on desktop platform. No matter enable
> > semaphores or not, the issue could always be reproduced.
> 
> sounds like we should track at a separate bug, as this bug is supposed to be
> closed when semaphores enabled.

Not really, enabling semaphores just papers over it but the bug is still
there. And using better tests it looks like you can still easily hit it.

Chris has already marked bugs as duplicate of this, so let's keep this one
as the master bug for all things "missed IRQ" on ivb.
Comment 22 Daniel Vetter 2011-11-23 02:18:22 UTC
*** Bug 43178 has been marked as a duplicate of this bug. ***
Comment 23 Chris Wilson 2011-11-23 02:26:20 UTC
I agree with Daniel on this, so retitling as appropriate. Enabling semaphores should be good enough to hide the issue with first day benchmarks and we can cross our fingers that no-one sees this in the wild... Based on our SNB experience, it should be, even though I still think it can be triggered by x11perf eventually.
Comment 24 Chris Wilson 2011-12-12 06:06:09 UTC
*** Bug 43745 has been marked as a duplicate of this bug. ***
Comment 25 Chris Wilson 2011-12-15 10:58:00 UTC
Created attachment 54473 [details] [review]
Fallback to polling seqno

Whilst Jesse and Ben try to get the workaround out of the hw guys, here is one viable fallback method: poll!

Apply the patch and pass i915.irq_notify=3 as a boot parameter.
Comment 26 Gordon Jin 2011-12-15 18:52:34 UTC
As the reporter Mengmeng has gone, I'm setting Fang Xun as the QA owner for this bug.

Xun, can you try this patch? Please also coordinate with Ouping for the power test result for this patch.
Comment 27 Ouping Zhang 2011-12-16 03:20:46 UTC
After http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=irq-poll&id=b7f78c851817b6b14ec2000238ba9780445fc382 patch applied, run 3D games and power workloads in IVB, there is no missed IRQ info in dmesg.  
(In reply to comment #25)
> Created attachment 54473 [details] [review] [review]
> Fallback to polling seqno
> Whilst Jesse and Ben try to get the workaround out of the hw guys, here is one
> viable fallback method: poll!
> Apply the patch and pass i915.irq_notify=3 as a boot parameter.
Comment 28 Ouping Zhang 2011-12-19 22:29:20 UTC
bug 43178 can be reproduced, it seems that this patch didn't fix it.
(In reply to comment #22)
> *** Bug 43178 has been marked as a duplicate of this bug. ***
Comment 29 Ouping Zhang 2011-12-19 23:41:35 UTC
sorry, I forgot to add the ‘i915.irq_notify=3’ parameter to the kernel command line, after the ‘i915.irq_notify=3’ parameter, no missed IRQ error in dmesg.
(In reply to comment #28)
> bug 43178 can be reproduced, it seems that this patch didn't fix it.
> (In reply to comment #22)
> > *** Bug 43178 has been marked as a duplicate of this bug. ***
Comment 30 libo 2012-01-10 00:01:40 UTC
This bug still exists when run citybenchmark GL32 and ES32 with latest kernel, which enable semaphores by default.

kernel: (drm-intel-next) d8e70a254d8f2da141006e496a51502b79115e80
Comment 31 Florian Mickler 2012-01-12 14:20:44 UTC
A patch referencing this bug report has been merged in Linux v3.2-rc6:

commit f45b55575cedb7efa782e43f1ea74338456d0381
Author: Eugeni Dodonov <eugeni.dodonov@intel.com>
Date:   Fri Dec 9 17:16:37 2011 -0800

    drm/i915: enable semaphores on per-device defaults
Comment 32 Gordon Jin 2012-01-30 23:49:52 UTC
I have been wondering for the progress here for quite a few days, until I just saw the patch in drm-intel-fixes.

Xun, can you verify?
Comment 33 fangxun 2012-02-01 04:30:35 UTC
Verified with kernel(drm-intel-fixes) a4ea430853b71753103ec693acfc8624bd3e748e.
Comment 34 Chris Wilson 2012-02-01 04:42:35 UTC
For the record, the workaround that works was:

commit 4cd53c0c8b01fc05c3ad5b2acdad02e37d3c2f55
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Dec 14 16:01:25 2012 +0100

    drm/i915: paper over missed irq issues with force wake voodoo

and is also required on snb, bug 45332.
Comment 35 Gordon Jin 2012-02-05 18:56:03 UTC
(In reply to comment #34)
> For the record, the workaround that works was:
> commit 4cd53c0c8b01fc05c3ad5b2acdad02e37d3c2f55
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date:   Fri Dec 14 16:01:25 2012 +0100
>     drm/i915: paper over missed irq issues with force wake voodoo
> and is also required on snb, bug 45332.

For the record, it's in 3.2.3 kernel.