Created attachment 128524 [details] basic-wait-all log Kernels built with the default CI config (e.g. https://intel-gfx-ci.01.org/CI/CI_DRM_1968/kernel.config.bz2 ) will trigger failures in the test igt/gem_exec/basic-wait-all The failures will rarely appear immediately; for faithful reproduction, running the test several times consecutively is needed (at least 4 times). Afterwards, the failures will be constant. Changing CONFIG_PROVE_LOCKING from "=y" to "=n" will stop the failures.
Created attachment 128525 [details] basic-wait-all dmesg log
Software: Fedora 24 kernel revision: 5415795 (drm-tip: 2016y-12m-16d-21h-26m-45s UTC integration manifest) kernel config: https://intel-gfx-ci.01.org/CI/CI_DRM_1968/kernel.config.bz2 Hardware: Intel i7-3770
Looks like the timeout value isn't being computed properly (possibly just that too much time is unaccounted?). Can you apply: diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1b2f18d0ca2b..dfe751b350ad 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3019,6 +3019,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { struct drm_i915_gem_wait *args = data; struct drm_i915_gem_object *obj; + u64 prev_timeout = args->timeout_ns; ktime_t start; long ret; @@ -3042,6 +3043,12 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file) args->timeout_ns = 0; } + pr_info("%s: timeout %d.%09d -> %d.%09d\n", __func__, + prev_timeout < 0 ? -1 : ns_to_timespec(prev_timeout).tv_sec, + prev_timeout < 0 ? 0 : ns_to_timespec(prev_timeout).tv_nsec, + args->timeout_ns < 0 ? -1 : ns_to_timespec(args->timeout_ns).tv_sec, + args->timeout_ns < 0 ? 0 : ns_to_timespec(args->timeout_ns).tv_nsec); + i915_gem_object_put(obj); return ret; } and get the output for a failure?
Created attachment 128544 [details] Dmesg log with applied patch from comment #3
The same config option triggers failures in a similar way in: igt/gem_sync/basic-each igt/gem_sync/basic-store-each
Created attachment 128545 [details] igt/gem_sync/basic-each log
Created attachment 128546 [details] igt/gem_sync/basic-each - dmesg (truncated beginning)
Created attachment 128547 [details] igt/gem_sync/basic-store-each - log
Created attachment 128548 [details] igt/gem_sync/basic-store-each - dmesg (truncated beginning)
[ 154.074065] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT [ 154.074748] i915_gem_wait_ioctl: timeout 0.500000000 -> 0.499993351 [ 154.075390] [drm:drm_ioctl] ret = -512 ... [ 155.072528] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT [ 155.073234] i915_gem_wait_ioctl: timeout 0.488539596 -> 0.488535373 [ 155.073916] [drm:drm_ioctl] ret = -512 [ 155.074605] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT [ 155.075307] i915_gem_wait_ioctl: timeout 0.488535373 -> 0.488532529 Walltime of 1s, but actual wait time of ~12ms. Each ioctl is taking ~1.4ms (inside drmIoctl) with a further interval of ~0.7ms between ioctls. And we account for 3us.
Created attachment 128606 [details] ftrace call graph log for i915_gem_wait_ioctl The call log capturing i915_gem_wait_ioctl invokation and overhead. The time of ioctl itself (20ns) is dwarfed by a slowdown somewhere in console_unlock() related to printk() calls (650ns)
cat /proc/consoles?
Also trying with CONFIG_DRM_FBDEV_EMULATION unset.
# cat /proc/consoles tty0 -WU (EC p ) 4:1 Upon further digging, the slow stack *seems* like this (may be wrong, the tracing tool is less useful with huge numbers of calls): console_unlock() call_console_drivers.isra.0.constprop.15() vt_console_print() lf() scrup() fbcon_scroll() fbcon_cursor() fbcon_redraw.isra.17() Inside of this, the delay is split up among many shorter calls, most time occupied by fbcon_putcs -> bit_putcs -> cfb_imageblit.
That's us! Woohoo \o/ What does cat /sys/kernel/debug/dri/0/i915_gem_framebuffer report? Mainly interested in knowing if you have a tiled + fenced fbcon.
Here it is: $ sudo cat /sys/kernel/debug/dri/0/i915_gem_framebuffer fbcon size: 1024 x 768, depth 24, 32 bpp, modifier 0x0, refcount 1, obj ffff880221116200: p 3072KiB 41 00 uncached (pinned x 1) (display) (ggtt offset: 00080000, size: 00300000, type: 0)
Try: diff --git a/lib/igt_core.c b/lib/igt_core.c index a18a728c..403b9423 100644 --- a/lib/igt_core.c +++ b/lib/igt_core.c @@ -67,6 +67,7 @@ #include "igt_core.h" #include "igt_aux.h" +#include "igt_sysfs.h" #ifdef HAVE_LIBGEN_H #include <libgen.h> /* for basename() on Solaris */ @@ -512,6 +513,7 @@ static void common_exit_handler(int sig) { if (!igt_only_list_subtests()) { low_mem_killer_disable(false); + kick_fbcon(true); } /* When not killed by a signal check that igt_exit() has been properly @@ -731,6 +733,7 @@ out: exit(ret == -1 ? 0 : IGT_EXIT_INVALID); if (!list_subtests) { + kick_fbcon(false); kmsg(KERN_INFO "[IGT] %s: executing\n", command_str); print_version();
After adding the kick_fbcon() calls, the problem goes away, even without rebooting.
Should this be turned into a patch for igt or should I try to identify the underlying issue causing the slowness?
I just realized the patch was merged. Should this bug be considered fixed now?
I kept the bug around in case we wanted to dig deeper into what the slowdown actually was. If it's below the level of care, lets close and move on.
I haven't seen the underlying slowness causing problems anywhere else yet. Perhaps we can close this and reopen if it reappears.
Closing. Please reopen if needed. Thanks.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.