Bug 99130

Summary:

[IVB] CONFIG_PROVE_LOCKING=y causes igt/gem_exec/basic-wait-all (and possibly others) to fail

Product:

DRI

Reporter:

Dorota Czaplejewicz <dorota.czaplejewicz>

Component:

IGT

Assignee:

Default DRI bug account <dri-devel>

Status:

CLOSED WORKSFORME

QA Contact:

Severity:

normal

Priority:

medium

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

All

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
basic-wait-all log	none
basic-wait-all dmesg log	none
Dmesg log with applied patch from comment #3	none
igt/gem_sync/basic-each log	none
igt/gem_sync/basic-each - dmesg (truncated beginning)	none
igt/gem_sync/basic-store-each - log	none
igt/gem_sync/basic-store-each - dmesg (truncated beginning)	none
ftrace call graph log for i915_gem_wait_ioctl	none

Description Dorota Czaplejewicz 2016-12-17 16:48:01 UTC

Created attachment 128524 [details]
basic-wait-all log

Kernels built with the default CI config (e.g. https://intel-gfx-ci.01.org/CI/CI_DRM_1968/kernel.config.bz2 ) will trigger failures in the test
igt/gem_exec/basic-wait-all
The failures will rarely appear immediately; for faithful reproduction, running the test several times consecutively is needed (at least 4 times). Afterwards, the failures will be constant.

Changing CONFIG_PROVE_LOCKING from "=y" to "=n" will stop the failures.

Comment 1 Dorota Czaplejewicz 2016-12-17 16:49:09 UTC

Created attachment 128525 [details]
basic-wait-all dmesg log

Comment 2 Dorota Czaplejewicz 2016-12-17 16:52:48 UTC

Software:
Fedora 24
kernel revision: 5415795 (drm-tip: 2016y-12m-16d-21h-26m-45s UTC integration manifest)
kernel config: https://intel-gfx-ci.01.org/CI/CI_DRM_1968/kernel.config.bz2

Hardware:
Intel i7-3770

Comment 3 Chris Wilson 2016-12-17 17:42:49 UTC

Looks like the timeout value isn't being computed properly (possibly just that too much time is unaccounted?). Can you apply:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1b2f18d0ca2b..dfe751b350ad 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3019,6 +3019,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
        struct drm_i915_gem_wait *args = data;
        struct drm_i915_gem_object *obj;
+       u64 prev_timeout = args->timeout_ns;
        ktime_t start;
        long ret;
 
@@ -3042,6 +3043,12 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
                        args->timeout_ns = 0;
        }
 
+       pr_info("%s: timeout %d.%09d -> %d.%09d\n", __func__,
+               prev_timeout < 0 ? -1 : ns_to_timespec(prev_timeout).tv_sec,
+               prev_timeout < 0 ? 0 : ns_to_timespec(prev_timeout).tv_nsec,
+               args->timeout_ns < 0 ? -1 : ns_to_timespec(args->timeout_ns).tv_sec,
+               args->timeout_ns < 0 ? 0 : ns_to_timespec(args->timeout_ns).tv_nsec);
+
        i915_gem_object_put(obj);
        return ret;
 }

and get the output for a failure?

Comment 4 Dorota Czaplejewicz 2016-12-19 13:52:35 UTC

Created attachment 128544 [details]
Dmesg log with applied patch from comment #3

Comment 5 Dorota Czaplejewicz 2016-12-19 13:55:39 UTC

The same config option triggers failures in a similar way in:

igt/gem_sync/basic-each
igt/gem_sync/basic-store-each

Comment 6 Dorota Czaplejewicz 2016-12-19 13:56:28 UTC

Created attachment 128545 [details]
igt/gem_sync/basic-each log

Comment 7 Dorota Czaplejewicz 2016-12-19 14:00:07 UTC

Created attachment 128546 [details]
igt/gem_sync/basic-each - dmesg (truncated beginning)

Comment 8 Dorota Czaplejewicz 2016-12-19 14:00:36 UTC

Created attachment 128547 [details]
igt/gem_sync/basic-store-each - log

Comment 9 Dorota Czaplejewicz 2016-12-19 14:01:02 UTC

Created attachment 128548 [details]
igt/gem_sync/basic-store-each - dmesg (truncated beginning)

Comment 10 Chris Wilson 2016-12-19 14:06:58 UTC

[  154.074065] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT
[  154.074748] i915_gem_wait_ioctl: timeout 0.500000000 -> 0.499993351
[  154.075390] [drm:drm_ioctl] ret = -512
...
[  155.072528] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT
[  155.073234] i915_gem_wait_ioctl: timeout 0.488539596 -> 0.488535373
[  155.073916] [drm:drm_ioctl] ret = -512
[  155.074605] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT
[  155.075307] i915_gem_wait_ioctl: timeout 0.488535373 -> 0.488532529

Walltime of 1s, but actual wait time of ~12ms.

Each ioctl is taking ~1.4ms (inside drmIoctl) with a further interval of ~0.7ms between ioctls. And we account for 3us.

Comment 11 Dorota Czaplejewicz 2016-12-21 14:50:00 UTC

Created attachment 128606 [details]
ftrace call graph log for i915_gem_wait_ioctl

The call log capturing i915_gem_wait_ioctl invokation and overhead.
The time of ioctl itself (20ns) is dwarfed by a slowdown somewhere in console_unlock() related to printk() calls (650ns)

Comment 12 Chris Wilson 2016-12-21 15:05:24 UTC

cat /proc/consoles?

Comment 13 Chris Wilson 2016-12-21 15:06:45 UTC

Also trying with CONFIG_DRM_FBDEV_EMULATION unset.

Comment 14 Dorota Czaplejewicz 2016-12-21 15:44:25 UTC

# cat /proc/consoles
tty0                 -WU (EC p  )    4:1

Upon further digging, the slow stack *seems* like this (may be wrong, the tracing tool is less useful with huge numbers of calls):

console_unlock()
 call_console_drivers.isra.0.constprop.15()
  vt_console_print()
   lf()
    scrup()
     fbcon_scroll()
      fbcon_cursor()
       fbcon_redraw.isra.17()

Inside of this, the delay is split up among many shorter calls, most time occupied by fbcon_putcs -> bit_putcs -> cfb_imageblit.

Comment 15 Chris Wilson 2016-12-21 15:55:51 UTC

That's us! Woohoo \o/

What does cat /sys/kernel/debug/dri/0/i915_gem_framebuffer report? Mainly interested in knowing if you have a tiled + fenced fbcon.

Comment 16 Dorota Czaplejewicz 2016-12-21 16:17:33 UTC

Here it is:

$ sudo cat /sys/kernel/debug/dri/0/i915_gem_framebuffer

fbcon size: 1024 x 768, depth 24, 32 bpp, modifier 0x0, refcount 1, obj ffff880221116200:  p        3072KiB 41 00  uncached (pinned x 1) (display) (ggtt offset: 00080000, size: 00300000, type: 0)

Comment 17 Chris Wilson 2016-12-21 16:21:06 UTC

Try:

diff --git a/lib/igt_core.c b/lib/igt_core.c
index a18a728c..403b9423 100644
--- a/lib/igt_core.c
+++ b/lib/igt_core.c
@@ -67,6 +67,7 @@
 
 #include "igt_core.h"
 #include "igt_aux.h"
+#include "igt_sysfs.h"
 
 #ifdef HAVE_LIBGEN_H
 #include <libgen.h>   /* for basename() on Solaris */
@@ -512,6 +513,7 @@ static void common_exit_handler(int sig)
 {
        if (!igt_only_list_subtests()) {
                low_mem_killer_disable(false);
+               kick_fbcon(true);
        }
 
        /* When not killed by a signal check that igt_exit() has been properly
@@ -731,6 +733,7 @@ out:
                exit(ret == -1 ? 0 : IGT_EXIT_INVALID);
 
        if (!list_subtests) {
+               kick_fbcon(false);
                kmsg(KERN_INFO "[IGT] %s: executing\n", command_str);
                print_version();

Comment 18 Dorota Czaplejewicz 2016-12-21 17:30:18 UTC

After adding the kick_fbcon() calls, the problem goes away, even without rebooting.

Comment 19 Dorota Czaplejewicz 2017-01-25 15:05:50 UTC

Should this be turned into a patch for igt or should I try to identify the underlying issue causing the slowness?

Comment 20 Dorota Czaplejewicz 2017-01-25 16:58:06 UTC

I just realized the patch was merged. Should this bug be considered fixed now?

Comment 21 Chris Wilson 2017-01-25 17:03:48 UTC

I kept the bug around in case we wanted to dig deeper into what the slowdown actually was. If it's below the level of care, lets close and move on.

Comment 22 Dorota Czaplejewicz 2017-01-27 10:11:35 UTC

I haven't seen the underlying slowness causing problems anywhere else yet. Perhaps we can close this and reopen if it reappears.

Comment 23 Elizabeth 2017-09-14 22:38:15 UTC

Closing. Please reopen if needed. Thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.