Summary: | [IVB] CONFIG_PROVE_LOCKING=y causes igt/gem_exec/basic-wait-all (and possibly others) to fail | ||
---|---|---|---|
Product: | DRI | Reporter: | Dorota Czaplejewicz <dorota.czaplejewicz> |
Component: | IGT | Assignee: | Default DRI bug account <dri-devel> |
Status: | CLOSED WORKSFORME | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Dorota Czaplejewicz
2016-12-17 16:48:01 UTC
Created attachment 128525 [details]
basic-wait-all dmesg log
Software: Fedora 24 kernel revision: 5415795 (drm-tip: 2016y-12m-16d-21h-26m-45s UTC integration manifest) kernel config: https://intel-gfx-ci.01.org/CI/CI_DRM_1968/kernel.config.bz2 Hardware: Intel i7-3770 Looks like the timeout value isn't being computed properly (possibly just that too much time is unaccounted?). Can you apply: diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1b2f18d0ca2b..dfe751b350ad 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3019,6 +3019,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { struct drm_i915_gem_wait *args = data; struct drm_i915_gem_object *obj; + u64 prev_timeout = args->timeout_ns; ktime_t start; long ret; @@ -3042,6 +3043,12 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file) args->timeout_ns = 0; } + pr_info("%s: timeout %d.%09d -> %d.%09d\n", __func__, + prev_timeout < 0 ? -1 : ns_to_timespec(prev_timeout).tv_sec, + prev_timeout < 0 ? 0 : ns_to_timespec(prev_timeout).tv_nsec, + args->timeout_ns < 0 ? -1 : ns_to_timespec(args->timeout_ns).tv_sec, + args->timeout_ns < 0 ? 0 : ns_to_timespec(args->timeout_ns).tv_nsec); + i915_gem_object_put(obj); return ret; } and get the output for a failure? Created attachment 128544 [details] Dmesg log with applied patch from comment #3 The same config option triggers failures in a similar way in: igt/gem_sync/basic-each igt/gem_sync/basic-store-each Created attachment 128545 [details]
igt/gem_sync/basic-each log
Created attachment 128546 [details]
igt/gem_sync/basic-each - dmesg (truncated beginning)
Created attachment 128547 [details]
igt/gem_sync/basic-store-each - log
Created attachment 128548 [details]
igt/gem_sync/basic-store-each - dmesg (truncated beginning)
[ 154.074065] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT [ 154.074748] i915_gem_wait_ioctl: timeout 0.500000000 -> 0.499993351 [ 154.075390] [drm:drm_ioctl] ret = -512 ... [ 155.072528] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT [ 155.073234] i915_gem_wait_ioctl: timeout 0.488539596 -> 0.488535373 [ 155.073916] [drm:drm_ioctl] ret = -512 [ 155.074605] [drm:drm_ioctl] pid=2323, dev=0xe200, auth=1, I915_GEM_WAIT [ 155.075307] i915_gem_wait_ioctl: timeout 0.488535373 -> 0.488532529 Walltime of 1s, but actual wait time of ~12ms. Each ioctl is taking ~1.4ms (inside drmIoctl) with a further interval of ~0.7ms between ioctls. And we account for 3us. Created attachment 128606 [details]
ftrace call graph log for i915_gem_wait_ioctl
The call log capturing i915_gem_wait_ioctl invokation and overhead.
The time of ioctl itself (20ns) is dwarfed by a slowdown somewhere in console_unlock() related to printk() calls (650ns)
cat /proc/consoles? Also trying with CONFIG_DRM_FBDEV_EMULATION unset. # cat /proc/consoles tty0 -WU (EC p ) 4:1 Upon further digging, the slow stack *seems* like this (may be wrong, the tracing tool is less useful with huge numbers of calls): console_unlock() call_console_drivers.isra.0.constprop.15() vt_console_print() lf() scrup() fbcon_scroll() fbcon_cursor() fbcon_redraw.isra.17() Inside of this, the delay is split up among many shorter calls, most time occupied by fbcon_putcs -> bit_putcs -> cfb_imageblit. That's us! Woohoo \o/ What does cat /sys/kernel/debug/dri/0/i915_gem_framebuffer report? Mainly interested in knowing if you have a tiled + fenced fbcon. Here it is: $ sudo cat /sys/kernel/debug/dri/0/i915_gem_framebuffer fbcon size: 1024 x 768, depth 24, 32 bpp, modifier 0x0, refcount 1, obj ffff880221116200: p 3072KiB 41 00 uncached (pinned x 1) (display) (ggtt offset: 00080000, size: 00300000, type: 0) Try: diff --git a/lib/igt_core.c b/lib/igt_core.c index a18a728c..403b9423 100644 --- a/lib/igt_core.c +++ b/lib/igt_core.c @@ -67,6 +67,7 @@ #include "igt_core.h" #include "igt_aux.h" +#include "igt_sysfs.h" #ifdef HAVE_LIBGEN_H #include <libgen.h> /* for basename() on Solaris */ @@ -512,6 +513,7 @@ static void common_exit_handler(int sig) { if (!igt_only_list_subtests()) { low_mem_killer_disable(false); + kick_fbcon(true); } /* When not killed by a signal check that igt_exit() has been properly @@ -731,6 +733,7 @@ out: exit(ret == -1 ? 0 : IGT_EXIT_INVALID); if (!list_subtests) { + kick_fbcon(false); kmsg(KERN_INFO "[IGT] %s: executing\n", command_str); print_version(); After adding the kick_fbcon() calls, the problem goes away, even without rebooting. Should this be turned into a patch for igt or should I try to identify the underlying issue causing the slowness? I just realized the patch was merged. Should this bug be considered fixed now? I kept the bug around in case we wanted to dig deeper into what the slowdown actually was. If it's below the level of care, lets close and move on. I haven't seen the underlying slowness causing problems anywhere else yet. Perhaps we can close this and reopen if it reappears. Closing. Please reopen if needed. Thanks. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.