Bug 93847 - GuC is calling a sleeping function in atomic context
Summary: GuC is calling a sleeping function in atomic context
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Dave Gordon
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-25 11:37 UTC by Tvrtko Ursulin
Modified: 2017-07-24 22:43 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, SKL
i915 features: power/Other


Attachments

Description Tvrtko Ursulin 2016-01-25 11:37:10 UTC
In i915_guc_wq_check_space, usleep_range is being called while kmap_atomic is held. This results in:

[   34.098798] BUG: scheduling while atomic: gem_close_race/1941/0x00000002
[   34.098822] Modules linked in: hid_generic usbhid i915 asix usbnet libphy mii i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm coretemp i2c_hid hid video pinctrl_sunrisepoint pinctrl_intel acpi_pad nls_iso8859_1 e1000e ptp psmouse pps_core ahci libahci
[   34.098824] CPU: 0 PID: 1941 Comm: gem_close_race Tainted: G     U          4.4.0-160121+ #123
[   34.098824] Hardware name: Intel Corporation Skylake Client platform/Skylake AIO DDR3L RVP10, BIOS SKLSE2R1.R00.X100.B01.1509220551 09/22/2015
[   34.098825]  0000000000013e40 ffff880166c27a78 ffffffff81280d02 ffff880172c13e40
[   34.098826]  ffff880166c27a88 ffffffff810c203a ffff880166c27ac8 ffffffff814ec808
[   34.098827]  ffff88016b7c6000 ffff880166c28000 00000000000f4240 0000000000000001
[   34.098827] Call Trace:
[   34.098831]  [<ffffffff81280d02>] dump_stack+0x4b/0x79
[   34.098833]  [<ffffffff810c203a>] __schedule_bug+0x41/0x4f
[   34.098834]  [<ffffffff814ec808>] __schedule+0x5a8/0x690
[   34.098835]  [<ffffffff814ec927>] schedule+0x37/0x80
[   34.098836]  [<ffffffff814ef3fd>] schedule_hrtimeout_range_clock+0xad/0x130
[   34.098837]  [<ffffffff81090be0>] ? hrtimer_init+0x10/0x10
[   34.098838]  [<ffffffff814ef3f1>] ? schedule_hrtimeout_range_clock+0xa1/0x130
[   34.098839]  [<ffffffff814ef48e>] schedule_hrtimeout_range+0xe/0x10
[   34.098840]  [<ffffffff814eef9b>] usleep_range+0x3b/0x40
[   34.098853]  [<ffffffffa01ec109>] i915_guc_wq_check_space+0x119/0x210 [i915]
[   34.098861]  [<ffffffffa01da47c>] intel_logical_ring_alloc_request_extras+0x5c/0x70 [i915]
[   34.098869]  [<ffffffffa01cdbf1>] i915_gem_request_alloc+0x91/0x170 [i915]
[   34.098875]  [<ffffffffa01c1c07>] i915_gem_do_execbuffer.isra.25+0xbc7/0x12a0 [i915]
[   34.098882]  [<ffffffffa01cb785>] ? i915_gem_object_get_pages_gtt+0x225/0x3c0 [i915]
[   34.098889]  [<ffffffffa01d1fb6>] ? i915_gem_pwrite_ioctl+0xd6/0x9f0 [i915]
[   34.098895]  [<ffffffffa01c2e68>] i915_gem_execbuffer2+0xa8/0x250 [i915]
[   34.098900]  [<ffffffffa00f65d8>] drm_ioctl+0x258/0x4f0 [drm]
[   34.098906]  [<ffffffffa01c2dc0>] ? i915_gem_execbuffer+0x340/0x340 [i915]
[   34.098908]  [<ffffffff8111590d>] do_vfs_ioctl+0x2cd/0x4a0
[   34.098909]  [<ffffffff8111eac2>] ? __fget+0x72/0xb0
[   34.098910]  [<ffffffff81115b1c>] SyS_ioctl+0x3c/0x70
[   34.098911]  [<ffffffff814effd7>] entry_SYSCALL_64_fastpath+0x12/0x6a
[   34.100208] ------------[ cut here ]------------
Comment 1 Chris Wilson 2016-03-27 15:03:51 UTC
Sleeping whilst atomic is just the tip of the iceberg. That's a 20 millisecond uninterruptible wait!
Comment 2 Chris Wilson 2016-03-30 12:07:12 UTC
So things to note here: Using a single page for a workqueue (256 entries) is still more than the number of requests we can fit into a ringbuffer and so does not become the ratelimiting factor on submission, i.e. we can more to a single page and a persistent kmap and speed things up with simpler code.

There are quite a few ABI issues (returning incorrect timeouts to userspace etc) and ignored error propagation leading to silly bugs in GuC submission, and many dubious decisions like atomic64_cmpxchg!
Comment 3 Dave Gordon 2016-04-21 17:12:11 UTC
Fixed by:
0d92a6a drm/i915/guc: keep GuC doorbell & process descriptor mapped in kernel
Comment 4 Dave Gordon 2016-04-21 17:59:25 UTC
(In reply to Chris Wilson from comment #2)
> So things to note here: Using a single page for a workqueue (256 entries) is
> still more than the number of requests we can fit into a ringbuffer and so
> does not become the ratelimiting factor on submission, i.e. we can more to a
> single page and a persistent kmap and speed things up with simpler code.

'Fraid not, the workqueue size is fixed by the firmware, we don't get to choose whether to use one page or two (or four).

Secondly, the WQ is shared between all engines. Then the GuC demultiplexes requests into separate submissison queues internally. So one page might not be enough if all engines were really busy.

OTOH with the scheduler we flow-control submissions at an earlier stage, so at that point we can be sure that the GuC WQ will not be a bottleneck.

> There are quite a few ABI issues (returning incorrect timeouts to userspace
> etc) and ignored error propagation leading to silly bugs in GuC submission,
> and many dubious decisions like atomic64_cmpxchg!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.