Bug 99130 - [IVB] CONFIG_PROVE_LOCKING=y causes igt/gem_exec/basic-wait-all (and possibly others) to fail
Summary: [IVB] CONFIG_PROVE_LOCKING=y causes igt/gem_exec/basic-wait-all (and possibly...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: IGT (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-17 16:48 UTC by Dorota Czaplejewicz
Modified: 2017-09-14 22:38 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
basic-wait-all log (1.50 KB, text/plain)
2016-12-17 16:48 UTC, Dorota Czaplejewicz
no flags Details
basic-wait-all dmesg log (92.40 KB, text/plain)
2016-12-17 16:49 UTC, Dorota Czaplejewicz
no flags Details
Dmesg log with applied patch from comment #3 (124.37 KB, text/plain)
2016-12-19 13:52 UTC, Dorota Czaplejewicz
no flags Details
igt/gem_sync/basic-each log (618 bytes, text/x-c)
2016-12-19 13:56 UTC, Dorota Czaplejewicz
no flags Details
igt/gem_sync/basic-each - dmesg (truncated beginning) (17.79 KB, text/plain)
2016-12-19 14:00 UTC, Dorota Czaplejewicz
no flags Details
igt/gem_sync/basic-store-each - log (637 bytes, text/x-c)
2016-12-19 14:00 UTC, Dorota Czaplejewicz
no flags Details
igt/gem_sync/basic-store-each - dmesg (truncated beginning) (29.37 KB, text/plain)
2016-12-19 14:01 UTC, Dorota Czaplejewicz
no flags Details
ftrace call graph log for i915_gem_wait_ioctl (11.79 KB, text/x-c)
2016-12-21 14:50 UTC, Dorota Czaplejewicz
no flags Details

Description Dorota Czaplejewicz 2016-12-17 16:48:01 UTC
Created attachment 128524 [details]
basic-wait-all log

Kernels built with the default CI config (e.g. https://intel-gfx-ci.01.org/CI/CI_DRM_1968/kernel.config.bz2 ) will trigger failures in the test
igt/gem_exec/basic-wait-all
The failures will rarely appear immediately; for faithful reproduction, running the test several times consecutively is needed (at least 4 times). Afterwards, the failures will be constant.

Changing CONFIG_PROVE_LOCKING from "=y" to "=n" will stop the failures.
Comment 1 Dorota Czaplejewicz 2016-12-17 16:49:09 UTC
Created attachment 128525 [details]
basic-wait-all dmesg log
Comment 2 Dorota Czaplejewicz 2016-12-17 16:52:48 UTC
Software:
Fedora 24
kernel revision: 5415795 (drm-tip: 2016y-12m-16d-21h-26m-45s UTC integration manifest)
kernel config: https://intel-gfx-ci.01.org/CI/CI_DRM_1968/kernel.config.bz2

Hardware:
Intel i7-3770
Comment 3 Chris Wilson 2016-12-17 17:42:49 UTC
Looks like the timeout value isn't being computed properly (possibly just that too much time is unaccounted?). Can you apply:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1b2f18d0ca2b..dfe751b350ad 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3019,6 +3019,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
        struct drm_i915_gem_wait *args = data;
        struct drm_i915_gem_object *obj;
+       u64 prev_timeout = args->timeout_ns;
        ktime_t start;
        long ret;
 
@@ -3042,6 +3043,12 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
                        args->timeout_ns = 0;
        }
 
+       pr_info("%s: timeout %d.%09d -> %d.%09d\n", __func__,
+               prev_timeout < 0 ? -1 : ns_to_timespec(prev_timeout).tv_sec,
+               prev_timeout < 0 ? 0 : ns_to_timespec(prev_timeout).tv_nsec,
+               args->timeout_ns < 0 ? -1 : ns_to_timespec(args->timeout_ns).tv_sec,
+               args->timeout_ns < 0 ? 0 : ns_to_timespec(args->timeout_ns).tv_nsec);
+
        i915_gem_object_put(obj);
        return ret;
 }

and get the output for a failure?
Comment 4 Dorota Czaplejewicz 2016-12-19 13:52:35 UTC
Created attachment 128544 [details]
Dmesg log with applied patch from comment #3
Comment 5 Dorota Czaplejewicz 2016-12-19 13:55:39 UTC
The same config option triggers failures in a similar way in:

igt/gem_sync/basic-each
igt/gem_sync/basic-store-each
Comment 6 Dorota Czaplejewicz 2016-12-19 13:56:28 UTC
Created attachment 128545 [details]
igt/gem_sync/basic-each log
Comment 7 Dorota Czaplejewicz 2016-12-19 14:00:07 UTC
Created attachment 128546 [details]
igt/gem_sync/basic-each - dmesg (truncated beginning)
Comment 8 Dorota Czaplejewicz 2016-12-19 14:00:36 UTC
Created attachment 128547 [details]
igt/gem_sync/basic-store-each - log
Comment 9 Dorota Czaplejewicz 2016-12-19 14:01:02 UTC
Created attachment 128548 [details]
igt/gem_sync/basic-store-each - dmesg (truncated beginning)
Comment 10 Chris Wilson 2016-12-19 14:06:58 UTC
[  154.074065] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT
[  154.074748] i915_gem_wait_ioctl: timeout 0.500000000 -> 0.499993351
[  154.075390] [drm:drm_ioctl] ret = -512
...
[  155.072528] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT
[  155.073234] i915_gem_wait_ioctl: timeout 0.488539596 -> 0.488535373
[  155.073916] [drm:drm_ioctl] ret = -512
[  155.074605] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT
[  155.075307] i915_gem_wait_ioctl: timeout 0.488535373 -> 0.488532529

Walltime of 1s, but actual wait time of ~12ms.

Each ioctl is taking ~1.4ms (inside drmIoctl) with a further interval of ~0.7ms between ioctls. And we account for 3us.
Comment 11 Dorota Czaplejewicz 2016-12-21 14:50:00 UTC
Created attachment 128606 [details]
ftrace call graph log for i915_gem_wait_ioctl

The call log capturing i915_gem_wait_ioctl invokation and overhead.
The time of ioctl itself (20ns) is dwarfed by a slowdown somewhere in console_unlock() related to printk() calls (650ns)
Comment 12 Chris Wilson 2016-12-21 15:05:24 UTC
cat /proc/consoles?
Comment 13 Chris Wilson 2016-12-21 15:06:45 UTC
Also trying with CONFIG_DRM_FBDEV_EMULATION unset.
Comment 14 Dorota Czaplejewicz 2016-12-21 15:44:25 UTC
# cat /proc/consoles
tty0                 -WU (EC p  )    4:1

Upon further digging, the slow stack *seems* like this (may be wrong, the tracing tool is less useful with huge numbers of calls):

console_unlock()
 call_console_drivers.isra.0.constprop.15()
  vt_console_print()
   lf()
    scrup()
     fbcon_scroll()
      fbcon_cursor()
       fbcon_redraw.isra.17()

Inside of this, the delay is split up among many shorter calls, most time occupied by fbcon_putcs -> bit_putcs -> cfb_imageblit.
Comment 15 Chris Wilson 2016-12-21 15:55:51 UTC
That's us! Woohoo \o/

What does cat /sys/kernel/debug/dri/0/i915_gem_framebuffer report? Mainly interested in knowing if you have a tiled + fenced fbcon.
Comment 16 Dorota Czaplejewicz 2016-12-21 16:17:33 UTC
Here it is:

$ sudo cat /sys/kernel/debug/dri/0/i915_gem_framebuffer

fbcon size: 1024 x 768, depth 24, 32 bpp, modifier 0x0, refcount 1, obj ffff880221116200:  p        3072KiB 41 00  uncached (pinned x 1) (display) (ggtt offset: 00080000, size: 00300000, type: 0)
Comment 17 Chris Wilson 2016-12-21 16:21:06 UTC
Try:

diff --git a/lib/igt_core.c b/lib/igt_core.c
index a18a728c..403b9423 100644
--- a/lib/igt_core.c
+++ b/lib/igt_core.c
@@ -67,6 +67,7 @@
 
 #include "igt_core.h"
 #include "igt_aux.h"
+#include "igt_sysfs.h"
 
 #ifdef HAVE_LIBGEN_H
 #include <libgen.h>   /* for basename() on Solaris */
@@ -512,6 +513,7 @@ static void common_exit_handler(int sig)
 {
        if (!igt_only_list_subtests()) {
                low_mem_killer_disable(false);
+               kick_fbcon(true);
        }
 
        /* When not killed by a signal check that igt_exit() has been properly
@@ -731,6 +733,7 @@ out:
                exit(ret == -1 ? 0 : IGT_EXIT_INVALID);
 
        if (!list_subtests) {
+               kick_fbcon(false);
                kmsg(KERN_INFO "[IGT] %s: executing\n", command_str);
                print_version();
Comment 18 Dorota Czaplejewicz 2016-12-21 17:30:18 UTC
After adding the kick_fbcon() calls, the problem goes away, even without rebooting.
Comment 19 Dorota Czaplejewicz 2017-01-25 15:05:50 UTC
Should this be turned into a patch for igt or should I try to identify the underlying issue causing the slowness?
Comment 20 Dorota Czaplejewicz 2017-01-25 16:58:06 UTC
I just realized the patch was merged. Should this bug be considered fixed now?
Comment 21 Chris Wilson 2017-01-25 17:03:48 UTC
I kept the bug around in case we wanted to dig deeper into what the slowdown actually was. If it's below the level of care, lets close and move on.
Comment 22 Dorota Czaplejewicz 2017-01-27 10:11:35 UTC
I haven't seen the underlying slowness causing problems anywhere else yet. Perhaps we can close this and reopen if it reappears.
Comment 23 Elizabeth 2017-09-14 22:38:15 UTC
Closing. Please reopen if needed. Thanks.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.