Bug 76304 - [BDW Bisected]igt/gem_dummy_reloc_loop/blt causes [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... blitter ring idle
Summary: [BDW Bisected]igt/gem_dummy_reloc_loop/blt causes [drm:i915_hangcheck_elapsed...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high major
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-18 07:32 UTC by lu hua
Modified: 2017-10-06 14:39 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (95.58 KB, text/plain)
2014-03-18 07:32 UTC, lu hua
no flags Details

Description lu hua 2014-03-18 07:32:51 UTC
Created attachment 95982 [details]
dmesg

System Environment:
--------------------------
Platform: Broadwell
kernel: (drm-intel-nightly)b1859622badb7509586987af5269aa525a0c112f

Bug detailed description:
------------------------- 
run ./gem_dummy_reloc_loop --run-subtest blt, report <3>[  410.576752] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... blitter ring idle.
It happens on Broadwell with -nightly and -queued kernel.
igt/gem_dummy_reloc_loop/mixed also has this error.

The latest known good commit: b6ae3c7c60161a9b1e15b1ccd6412fad65b7d9cf
The latest known bad commit: b2040f6fed736ccd2319768bc59833abe74148b8

output:
IGT-Version: 1.6-g10571b8 (x86_64) (Linux: 3.14.0-rc6_drm-intel-nightly_b18596_20140314+ x86_64)
running dummy loop on blt
dummy loop run on blt completed
Subtest blt: SUCCESS

Reproduce steps:
---------------------------- 
1. ./gem_dummy_reloc_loop --run-subtest blt
Comment 1 Chris Wilson 2014-03-18 07:35:23 UTC
Please bisect.
Comment 2 lu hua 2014-03-19 07:33:56 UTC
Bisect shows: 8232644ccf099548710843e97360a3fcd6d28e04 is the first bad commit
commit 8232644ccf099548710843e97360a3fcd6d28e04
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Mar 5 12:00:39 2014 +0000

    drm/i915: Convert the forcewake worker into a timer func

    We don't want to suffer scheduling delay when turning off the GPU after
    waking it up to touch registers. Ideally, we only want to keep the GPU
    awake for the register access sequence, with a single forcewake dance on
    the first access and release immediately after the last. We set a timer
    on the first access so that we only dance once and on the next scheduler
    tick, we drop the forcewake again.

    This moves the cleanup routine from the common i915 workqueue to a timer
    func so that we don't anger powertop, and drop the forcewake again
    quicker.

    v2: Enable the deferred force_wake_put for regular register reads as
        well.
    v3: Beautification and make sure we disable forcewake when shutting
        down.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Ben Widawsky <ben@bwidawsk.net>
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 3 Chris Wilson 2014-03-19 07:48:50 UTC
Can you please attach the output (trace.dat) of:

$ trace-cmd record -e i915 ./gem_dummy_reloc_loop --run-subtest blt
Comment 4 lu hua 2014-03-21 03:14:09 UTC
(In reply to comment #3)
> Can you please attach the output (trace.dat) of:
> 
> $ trace-cmd record -e i915 ./gem_dummy_reloc_loop --run-subtest blt

/sys/kernel/debug/tracing/events/i915/filter
/sys/kernel/debug/tracing/events/*/i915/filter
IGT-Version: 1.6-gcde058a (x86_64) (Linux: 3.14.0-rc7_drm-intel-nightly_2c0d38_2
running dummy loop on render
dummy loop run on render completed
Subtest render: SUCCESS
running dummy loop on bsd
dummy loop run on bsd completed
Subtest bsd: SUCCESS
running dummy loop on blt
dummy loop run on blt completed
Subtest blt: SUCCESS
running dummy loop on vebox
Connection to x-bdw02 closed by remote host.
Connection to x-bdw02 closed.
[root@x-pk2 ~]# ssh x-bdw02
Last login: Thu Dec 12 00:01:05 2013 from luhua.sh.intel.com
[root@x-bdw02 ~]# uname -a
Linux x-bdw02 3.13.0_drm-intel-next-queued_823264_20140314_debug+ #960 SMP Fri Mar 14 02:31:27 CST 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@x-bdw02 ~]# cd /GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests/
[root@x-bdw02 tests]# trace-cmd record -e i915 ./gem_dummy_reloc_loop
/sys/kernel/debug/tracing/events/i915/filter
/sys/kernel/debug/tracing/events/*/i915/filter
IGT-Version: 1.6-gcde058a (x86_64) (Linux: 3.13.0_drm-intel-next-queued_823264_20140314_debug+ x86_64)
running dummy loop on render
dummy loop run on render completed
Subtest render: SUCCESS
running dummy loop on bsd
dummy loop run on bsd completed
Subtest bsd: SUCCESS
running dummy loop on blt
dummy loop run on blt completed
Subtest blt: SUCCESS
running dummy loop on vebox
dummy loop run on vebox completed
Subtest vebox: SUCCESS
running dummy loop on random rings
dummy loop run on random rings completed
Subtest mixed: SUCCESS
Kernel buffer statistics:
  Note: "entries" are the entries left in the kernel ring buffer and are not
        recorded in the trace data. They should all be zero.

CPU: 0
entries: 0
overrun: 0
commit overrun: 0
bytes: 3868
oldest event ts:   522.073751
now ts:   522.154184
dropped events: 0
read events: 68516225

CPU: 1
entries: 0
overrun: 0
commit overrun: 0
bytes: 1800
oldest event ts:   521.657562
now ts:   522.154479
dropped events: 0
read events: 6934134

CPU: 2
entries: 0
overrun: 0
commit overrun: 0
bytes: 304
oldest event ts:   520.351796
now ts:   522.154750
dropped events: 0
read events: 15421150

CPU: 3
entries: 0
overrun: 0
commit overrun: 0
bytes: 780
oldest event ts:   506.852074
now ts:   522.155017
dropped events: 0
read events: 2452128

CPU0 data recorded at offset=0x33a000
    1883295744 bytes in size
CPU1 data recorded at offset=0x70747000
    190619648 bytes in size
CPU2 data recorded at offset=0x7bd11000
    423923712 bytes in size
CPU3 data recorded at offset=0x9515a000
    67411968 bytes in size
Comment 5 Chris Wilson 2014-03-21 07:55:06 UTC
Attach the trace.dat
Comment 6 lu hua 2014-03-24 06:05:43 UTC
(In reply to comment #5)
> Attach the trace.dat

The file is too large. I will give you file location via mail.
Comment 7 Guang Yang 2014-05-17 01:18:06 UTC
Chris, any update on this issue?
Comment 8 Chris Wilson 2014-05-19 06:46:46 UTC
Nope. The debug perturbed the system enough to hide the issue. So far it looks like a genuine missed interrupt.
Comment 9 Ben Widawsky 2014-05-28 19:21:21 UTC
I can't reproduce, still an issue?
Comment 10 lu hua 2014-05-29 06:30:02 UTC
Test it on latest -nightly kernel, It works well.
Comment 11 lu hua 2014-05-29 06:30:16 UTC
Verified.Fixed.
Comment 12 Elizabeth 2017-10-06 14:39:21 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.