Bug 78681

Summary: [ILK] random missed interrupts from BSD
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: cprigent <christophe.prigent>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: minor    
Priority: medium CC: christophe.prigent, intel-gfx-bugs, jinxianx.guo, yi.sun
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: ILK i915 features: GEM/Other
Attachments:
Description Flags
dmesg
none
dmesg(kernel 333cf6_20150403) none

Description lu hua 2014-05-14 05:36:02 UTC
Created attachment 99006 [details]
dmesg

System Environment:
--------------------------
Platform:         Ironlake
Kernel:           (drm-intel-nightly)2be456541ea41728002ccca2de5235f48d14326e

Bug detailed description:
-------------------------
It randomly causes [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... bsd ring idle on Ironlake with -nightly and -queued kernel.It happens 1 in 10 runs. Run 20 cycles on -fixes kernel,it works well.

output:
IGT-Version: 1.6-g351e7d3 (x86_64) (Linux: 3.15.0-rc3_drm-intel-nightly_2be456_20140514+ x86_64)
Test requirement not met in function __real_main319, file gem_ring_sync_copy.c:342:
Last errno: 0, Success
Test requirement: (!(data.render.copy))
no render-copy function
Subtest sync-render-blitter-write-read: SKIP
Subtest sync-render-blitter-read-write: SKIP
Subtest sync-render-blitter-write-write: SKIP
Subtest sync-blitter-render-write-read: SKIP
Subtest sync-blitter-render-read-write: SKIP
Subtest sync-blitter-render-write-write: SKIP

Reproduce steps:
----------------
1. ./gem_ring_sync_copy
Comment 1 Daniel Vetter 2014-05-15 15:08:07 UTC
Sounds like a regression between -fixes and -nightly. Can you try to bisect this?
Comment 2 lu hua 2014-05-16 07:26:46 UTC
I will try to bisect it.
Comment 3 lu hua 2014-05-19 08:05:49 UTC
Bisect it(Run each commit 20 cycles).
Bisect shows b8866ef82d4f6c5361c8edd5199fe0cd19b947e3 is the first bad commit
commit b8866ef82d4f6c5361c8edd5199fe0cd19b947e3
Author:     Daniel Vetter <daniel.vetter@ffwll.ch>
AuthorDate: Thu Apr 24 23:54:40 2014 +0200
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Mon May 5 10:56:57 2014 +0200

    drm/i915/tv: extract set_color_conversion

    intel_tv_mode_set is still too bug.

    Reviewed-by: Imre Deak <imre.deak@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 4 Daniel Vetter 2014-05-19 08:31:33 UTC
Pretty sure that bisect is bogus - it changes a bit of kernel code which is never run on ilk. Please try to redo the bisect.
Comment 5 Daniel Vetter 2014-05-21 06:33:57 UTC
Since this is tricky to reproduce please first double-check the bisected commit:
- Re-run b8866ef82d4f6c5361c8edd5199fe0cd19b947e3 and check that it is indeed bad.
- Re-run the parent commit, i.e. 8cb92203bf223053ab6044211cfffe7b674cf526 and make really sure it works well.
Comment 6 lu hua 2014-05-23 03:36:52 UTC
(In reply to comment #5)
> Since this is tricky to reproduce please first double-check the bisected
> commit:
> - Re-run b8866ef82d4f6c5361c8edd5199fe0cd19b947e3 and check that it is
> indeed bad.
> - Re-run the parent commit, i.e. 8cb92203bf223053ab6044211cfffe7b674cf526
> and make really sure it works well.


The bisect result is incorrect. 
Run it on 8cb92203bf223053ab6044211cfffe7b674cf526, it fails in the 29th run.
Comment 7 lu hua 2014-06-13 03:04:33 UTC
We found many igt cases randomly have this error <3>[   78.814855] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... bsd ring idle

Run core_getclient
[root@x-pk1 tests]# ./core_getclient
IGT-Version: 1.7-g27d37a1 (x86_64) (Linux: 3.15.0-rc8_drm-intel-fixes_ce9557_20140612+ x86_64)
[root@x-pk1 tests]# dmesg -r | egrep "<[1-3]>" |grep drm
[root@x-pk1 tests]# ./core_getclient
IGT-Version: 1.7-g27d37a1 (x86_64) (Linux: 3.15.0-rc8_drm-intel-fixes_ce9557_20140612+ x86_64)
[root@x-pk1 tests]# dmesg -r | egrep "<[1-3]>" |grep drm
<3>[  222.829544] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... bsd ring idle
Comment 8 Chris Wilson 2014-09-06 11:29:27 UTC
Candidate for testing: 

git://people.freedesktop.org/~ickle/linux-2.6 requests

(the seqno to request rework)
Comment 9 lu hua 2014-09-09 06:50:18 UTC
(In reply to comment #8)
> Candidate for testing: 
> 
> git://people.freedesktop.org/~ickle/linux-2.6 requests
> 
> (the seqno to request rework)

Test on branch requests, commit 299421c8f49d90fd24d4bdacd01cb91b3606a95a.
Run ./gem_ring_sync_copy 20 cycles, it doesn't have this error.

Run ./core_getclient cycles, it has this error once.
[  266.671526] [drm:i915_gem_open]
[  266.671529] [drm:i915_gem_object_create_stolen] creating stolen object: size=1000
[  266.671532] [drm:i915_pages_create_for_stolen] offset=0x82a000, size=4096
[  266.671613] [drm:i915_gem_open]
[  266.671615] [drm:i915_gem_object_create_stolen] creating stolen object: size=1000
[  266.671616] [drm:i915_pages_create_for_stolen] offset=0x82b000, size=4096
[  269.806762] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... bsd ring idle
[  269.806925] [drm:intel_crtc_set_config] [CRTC:8] [FB:65] #connectors=1 (x y) (0 0)
[  269.806931] [drm:intel_set_config_compute_mode_changes] computed changes for [CRTC:8], mode_changed=0, fb_changed=0
[  269.806933] [drm:intel_modeset_stage_output_state] [CONNECTOR:14:VGA-1] to [CRTC:8]
[  269.806934] [drm:intel_modeset_stage_output_state] [CONNECTOR:30:DP-3] to [CRTC:12]
[  269.806937] [drm:intel_crtc_set_config] [CRTC:12] [FB:65] #connectors=1 (x y) (0 0)
[  269.806939] [drm:intel_set_config_compute_mode_changes] computed changes for [CRTC:12], mode_changed=0, fb_changed=0
[  269.806940] [drm:intel_modeset_stage_output_state] [CONNECTOR:14:VGA-1] to [CRTC:8]
[  269.806941] [drm:intel_modeset_stage_output_state] [CONNECTOR:30:DP-3] to [CRTC:12]
[  269.814505] core_getclient: executing
[  269.815072] [drm:i915_gem_open]
[  269.815074] [drm:i915_gem_object_create_stolen] creating stolen object: size=1000
[  269.815077] [drm:i915_pages_create_for_stolen] offset=0x82a000, size=4096
[  269.815166] [drm:i915_gem_open]
[  269.815168] [drm:i915_gem_object_create_stolen] creating stolen object: size=1000
[  269.815170] [drm:i915_pages_create_for_stolen] offset=0x82b000, size=4096
[  269.815300] [drm:intel_crtc_set_config] [CRTC:8] [FB:65] #connectors=1 (x y) (0 0)
[  269.815304] [drm:intel_set_config_compute_mode_changes] computed changes for [CRTC:8], mode_changed=0, fb_changed=0
[  269.815306] [drm:intel_modeset_stage_output_state] [CONNECTOR:14:VGA-1] to [CRTC:8]
[  269.815307] [drm:intel_modeset_stage_output_state] [CONNECTOR:30:DP-3] to [CRTC:12]
[  269.815309] [drm:intel_crtc_set_config] [CRTC:12] [FB:65] #connectors=1 (x y) (0 0)
[  269.815311] [drm:intel_set_config_compute_mode_changes] computed changes for [CRTC:12], mode_changed=0, fb_changed=0
[  269.815313] [drm:intel_modeset_stage_output_state] [CONNECTOR:14:VGA-1] to [CRTC:8]
[  269.815314] [drm:intel_modeset_stage_output_state] [CONNECTOR:30:DP-3] to [CRTC:12]
[  269.823407] core_getclient: executing
[  269.823965] [drm:i915_gem_open]
[  269.823968] [drm:i915_gem_object_create_stolen] creating stolen object: size=1000
[  269.823970] [drm:i915_pages_create_for_stolen] offset=0x82a000, size=4096
[  269.824052] [drm:i915_gem_open]
[  269.824054] [drm:i915_gem_object_create_stolen] creating stolen object: size=1000
[  269.824056] [drm:i915_pages_create_for_stolen] offset=0x82b000, size=4096
[  269.824125] [drm:intel_crtc_set_config] [CRTC:8] [FB:65] #connectors=1 (x y) (0 0)
[  269.824128] [drm:intel_set_config_compute_mode_changes] computed changes for [CRTC:8], mode_changed=0, fb_changed=0
[  269.824130] [drm:intel_modeset_stage_output_state] [CONNECTOR:14:VGA-1] to [CRTC:8]
[  269.824131] [drm:intel_modeset_stage_output_state] [CONNECTOR:30:DP-3] to [CRTC:12]
[  269.824133] [drm:intel_crtc_set_config] [CRTC:12] [FB:65] #connectors=1 (x y) (0 0)
[  269.824135] [drm:intel_set_config_compute_mode_changes] computed changes for [CRTC:12], mode_changed=0, fb_changed=0
[  269.824136] [drm:intel_modeset_stage_output_state] [CONNECTOR:14:VGA-1] to [CRTC:8]
[  269.824137] [drm:intel_modeset_stage_output_state] [CONNECTOR:30:DP-3] to [CRTC:12]
Comment 10 Chris Wilson 2014-09-09 07:00:51 UTC
Ok, the bug is still present. Oh well, I was being hopeful.
Comment 11 Chris Wilson 2014-09-09 07:04:07 UTC
I am pretty sure it is not a regression (it looks like a random hw issue), and we have a kernel workaround inplace should this misbehaviour impact real workarounds.
Comment 12 Chris Wilson 2014-09-09 07:04:34 UTC
s/real workarounds/real workloads/
Comment 13 lu hua 2014-11-11 06:17:30 UTC
We still confuse with this bug. The result is unstable and it impacts most of igt cases.
 
[root@x-pk5 tests]# ./core_getversion
IGT-Version: 1.8-gc049c39 (x86_64) (Linux: 3.18.0-rc3_drm-intel-nightly_de6d6c_20141111+ x86_64)
[root@x-pk5 tests]# dmesg -r|egrep "<[1-4]>"|grep drm
<3>[  112.807912] [drm:i915_hangcheck_elapsed [i915]] *ERROR* Hangcheck timer elapsed... bsd ring idle
Comment 14 lu hua 2015-04-10 01:56:55 UTC
Created attachment 115002 [details]
dmesg(kernel 333cf6_20150403)

run ./core_getversion 20 cycles, this error still exists.
Comment 15 cprigent 2015-10-08 16:39:04 UTC
Bug scrub:
Lower priority as it deals with IRL.
Assigned to Christophe to check if still reproduced
Comment 16 cprigent 2016-02-25 16:09:46 UTC
I launched gem_ring_sync_copy and core_getversion 30 times each, I don't see "*ERROR* Hangcheck timer elapsed... bsd ring idle" in kernel log 

Not reproduced with kernel 4.3-rc4
Comment 17 cprigent 2016-02-25 16:10:16 UTC
So closed as not reproduced
Comment 18 Chris Wilson 2016-02-27 20:00:19 UTC
Never fixed, just undertested.
Comment 19 Chris Wilson 2016-02-27 20:00:28 UTC

*** This bug has been marked as a duplicate of bug 94307 ***
Comment 20 cprigent 2016-10-11 12:23:58 UTC
My ILK does not work at all.
Assigned to Ricardo to check if the test is possible on your side.
Comment 21 Nobody 2016-10-13 22:53:14 UTC
We do not have a ILK locally
Comment 22 yann 2016-10-14 07:32:11 UTC
Closing as duplicate closed + fixed (see comment #18 on bug 94307)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.