Summary: | [IVB] missed IRQs | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | meng <mengmeng.meng> | ||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||
Status: | CLOSED FIXED | QA Contact: | fangxun <xunx.fang> | ||||
Severity: | critical | ||||||
Priority: | high | CC: | ben, bo.b.wang, bo.c.li, chadversary, chris, daniel, eugeni, guang.a.yang, jbarnes, jian.j.zhao, keithp, oliver, ouping.zhang | ||||
Version: | unspecified | ||||||
Hardware: | x86 (IA32) | ||||||
OS: | Linux (All) | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Bug Depends on: | |||||||
Bug Blocks: | 42991 | ||||||
Attachments: |
|
Description
meng
2011-06-30 21:47:20 UTC
Jesse can you do the trivial extension of the SNB HWSTAM workaround and see if prevents IVB from being so miserable? The HWSTAM for IVB workaround has gone upstream. For record of the commit info: http://git.kernel.org/?p=linux/kernel/git/keithp/linux-2.6.git;a=commit;h=2b1ecb7337592a7bf0989efac46a5b52daab769e drm/i915: apply HWSTAM writes to Ivy Bridge as well Test commit 2b1ecb7337592a7bf0989efac46a5b52daab769e,the problem still exists. And the dmesg now says? In dmesg: [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 5112, at 5112], missed IRQ? [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 6566, at 6566], missed IRQ? *** Bug 38413 has been marked as a duplicate of this bug. *** Meng, you mention in bug 38863 that the enabling semaphores workarounds the missed IRQ issue? Do you mind confirming that again? (In reply to comment #8) > Meng, you mention in bug 38863 that the enabling semaphores workarounds the > missed IRQ issue? Do you mind confirming that again? Hi Chris, I have tried on IvyBridge with mesa master and kernel 3.1-rc4 with 3D demos like openarena and urbanterror, if without semaphore the 3D games will stutter and exited immediately with GPU hang.(with error message like: (EE) intel(0): Detected a hung GPU, disabling acceleration. intel_do_flush_locked failed: Unknown error 18446744073709551611 and in dmesg the error message are: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1230 at 1228, next 1231)) If with semaphores set, there will be no GPU hang but still with some error message in dmesg like: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 1394 at 1393, next 1397) If test with nexuiz, there will be some missing IRQ error in dmesg. ([drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 33267, at 33267], missed IRQ? ) I've done many, many runs of Nexuiz with semaphores enabled on my rev04...with or without Compiz...and I've never seen any missed IRQ messages. Haven't seen any GPU hangs in a long time, either. I could not reproduce GPU hangs with Ken's patch from bug #38862 on IVB either, with or without compositing. This is with kernel 3.1-rc7 and mesa master at 0527c11d7aa42bd74f4527d7299e3c18f37c4c44. Could you retry with those version please? If it still hangs for you, perhaps it is a different issue.. (In reply to comment #11) > I could not reproduce GPU hangs with Ken's patch from bug #38862 on IVB either, > with or without compositing. > This is with kernel 3.1-rc7 and mesa master at > 0527c11d7aa42bd74f4527d7299e3c18f37c4c44. > Could you retry with those version please? If it still hangs for you, perhaps > it is a different issue.. Yes. This bug is not tracking GPU hang, it tracks that it will have "Missing IRQ" error message in dmesg. And this still existed in the newest code on IvyBridge. For the GPU hang bug, you can refer to the bug #38863 which is marked as fixed now. And I have replied it that it worked well with mesa master branch, but still not verify it because I think it should cherry pick to 7.11 branch. What do you think? Let's say this bug was to track screen stuttered with missed irq. The gpu hang and stutter have been fixed on mesa master so the rest work here is cherry-picking to 7.11. (maybe related to bug#38863) The rest missed irq message issue (but without hurt) is to be tracked in bug#41439. Does this still exist on 7.11 branch? *** Bug 41349 has been marked as a duplicate of this bug. *** This bug still exists on latest 7.11 branch. It also exists on mesa master branch. After some discussion in the bi-weekly Mesa team meeting today, this is a kernel issue. The difference between the systems that reproduce the bug and do not reproduce the bug seems to be whether or not semaphores are enabled. Systems that work fine all have semaphores enabled. Ben has a kernel patch that may help, but it hasn't not gone upstream yet. I could reproduced the missed IRQ issue with Intel_gpu_tool test case test/gem_dummy_reloc_loop. We reproduced it on a IvyBridge mobile platform, and the case never finish without dmesg or error information on desktop platform. No matter enable semaphores or not, the issue could always be reproduced. (In reply to comment #19) > I could reproduced the missed IRQ issue with Intel_gpu_tool test case > test/gem_dummy_reloc_loop. > We reproduced it on a IvyBridge mobile platform, and the case never finish > without dmesg or error information on desktop platform. No matter enable > semaphores or not, the issue could always be reproduced. sounds like we should track at a separate bug, as this bug is supposed to be closed when semaphores enabled. > --- Comment #20 from Gordon Jin <gordon.jin@intel.com> 2011-11-22 16:29:05 PST ---
> (In reply to comment #19)
> > I could reproduced the missed IRQ issue with Intel_gpu_tool test case
> > test/gem_dummy_reloc_loop.
> > We reproduced it on a IvyBridge mobile platform, and the case never finish
> > without dmesg or error information on desktop platform. No matter enable
> > semaphores or not, the issue could always be reproduced.
>
> sounds like we should track at a separate bug, as this bug is supposed to be
> closed when semaphores enabled.
Not really, enabling semaphores just papers over it but the bug is still
there. And using better tests it looks like you can still easily hit it.
Chris has already marked bugs as duplicate of this, so let's keep this one
as the master bug for all things "missed IRQ" on ivb.
*** Bug 43178 has been marked as a duplicate of this bug. *** I agree with Daniel on this, so retitling as appropriate. Enabling semaphores should be good enough to hide the issue with first day benchmarks and we can cross our fingers that no-one sees this in the wild... Based on our SNB experience, it should be, even though I still think it can be triggered by x11perf eventually. *** Bug 43745 has been marked as a duplicate of this bug. *** Created attachment 54473 [details] [review] Fallback to polling seqno Whilst Jesse and Ben try to get the workaround out of the hw guys, here is one viable fallback method: poll! Apply the patch and pass i915.irq_notify=3 as a boot parameter. As the reporter Mengmeng has gone, I'm setting Fang Xun as the QA owner for this bug. Xun, can you try this patch? Please also coordinate with Ouping for the power test result for this patch. After http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=irq-poll&id=b7f78c851817b6b14ec2000238ba9780445fc382 patch applied, run 3D games and power workloads in IVB, there is no missed IRQ info in dmesg. (In reply to comment #25) > Created attachment 54473 [details] [review] [review] > Fallback to polling seqno > Whilst Jesse and Ben try to get the workaround out of the hw guys, here is one > viable fallback method: poll! > Apply the patch and pass i915.irq_notify=3 as a boot parameter. bug 43178 can be reproduced, it seems that this patch didn't fix it. (In reply to comment #22) > *** Bug 43178 has been marked as a duplicate of this bug. *** sorry, I forgot to add the ‘i915.irq_notify=3’ parameter to the kernel command line, after the ‘i915.irq_notify=3’ parameter, no missed IRQ error in dmesg. (In reply to comment #28) > bug 43178 can be reproduced, it seems that this patch didn't fix it. > (In reply to comment #22) > > *** Bug 43178 has been marked as a duplicate of this bug. *** This bug still exists when run citybenchmark GL32 and ES32 with latest kernel, which enable semaphores by default. kernel: (drm-intel-next) d8e70a254d8f2da141006e496a51502b79115e80 A patch referencing this bug report has been merged in Linux v3.2-rc6: commit f45b55575cedb7efa782e43f1ea74338456d0381 Author: Eugeni Dodonov <eugeni.dodonov@intel.com> Date: Fri Dec 9 17:16:37 2011 -0800 drm/i915: enable semaphores on per-device defaults I have been wondering for the progress here for quite a few days, until I just saw the patch in drm-intel-fixes. Xun, can you verify? Verified with kernel(drm-intel-fixes) a4ea430853b71753103ec693acfc8624bd3e748e. For the record, the workaround that works was: commit 4cd53c0c8b01fc05c3ad5b2acdad02e37d3c2f55 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri Dec 14 16:01:25 2012 +0100 drm/i915: paper over missed irq issues with force wake voodoo and is also required on snb, bug 45332. (In reply to comment #34) > For the record, the workaround that works was: > commit 4cd53c0c8b01fc05c3ad5b2acdad02e37d3c2f55 > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Fri Dec 14 16:01:25 2012 +0100 > drm/i915: paper over missed irq issues with force wake voodoo > and is also required on snb, bug 45332. For the record, it's in 3.2.3 kernel. Closing old verified. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.