Summary: | gem cancel_userptr warning | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | mwa <matthew.auld> | ||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | normal | ||||||||
Priority: | medium | CC: | intel-gfx-bugs | ||||||
Version: | DRI git | ||||||||
Hardware: | x86-64 (AMD64) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
mwa
2016-08-14 12:51:09 UTC
Try https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=fence and tell me what the warn reports (if it fires)? [ 136.189244] WARNING: CPU: 2 PID: 113 at drivers/gpu/drm/i915/i915_gem_userptr.c:94 cancel_userptr+0x208/0x250 [i915] [ 136.189245] Failed to release pages: bind_count=1, pages_pin_count=1, pin_display=0 [ 136.189245] Modules linked in: rfcomm fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_security ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep arc4 iwlmvm mac80211 iTCO_wdt iTCO_vendor_support uvcvideo iwlwifi intel_rapl x86_pkg_temp_thermal coretemp videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_hda_codec_realtek videobuf2_core snd_hda_codec_hdmi snd_hda_codec_generic videodev cfg80211 snd_hda_intel [ 136.189271] btusb snd_hda_codec btrtl media joydev btbcm snd_hwdep rtsx_pci_ms snd_hda_core btintel memstick bluetooth nfsd snd_seq thinkpad_acpi wmi snd_seq_device rfkill snd_pcm auth_rpcgss intel_rst snd_timer nfs_acl snd mei_me lockd mei tpm_tis shpchp i2c_i801 lpc_ich tpm_tis_core intel_pch_thermal soundcore tpm i2c_smbus grace sunrpc dm_crypt hid_microsoft i915 i2c_algo_bit drm_kms_helper rtsx_pci_sdmmc mmc_core drm e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ptp serio_raw rtsx_pci pps_core fjes video [ 136.189294] CPU: 2 PID: 113 Comm: kworker/u16:3 Tainted: G W 4.8.0-rc1-drm-intel+ #62 [ 136.189295] Hardware name: LENOVO 20BW000FUK/20BW000FUK, BIOS JBET54WW (1.19 ) 11/06/2015 [ 136.189312] Workqueue: i915-userptr-release cancel_userptr [i915] [ 136.189313] 0000000000000286 0000000052dfdf5c ffff88022ba13d20 ffffffff813dcf2d [ 136.189315] ffff88022ba13d70 0000000000000000 ffff88022ba13d60 ffffffff810a750b [ 136.189316] 0000005e3dc99538 ffff880228e70068 ffff8801afb68f00 ffffffff81552000 [ 136.189318] Call Trace: [ 136.189321] [<ffffffff813dcf2d>] dump_stack+0x63/0x86 [ 136.189324] [<ffffffff810a750b>] __warn+0xcb/0xf0 [ 136.189325] [<ffffffff81552000>] ? fence_wait_timeout.part.9+0xc0/0xc0 [ 136.189327] [<ffffffff810a758f>] warn_slowpath_fmt+0x5f/0x80 [ 136.189340] [<ffffffffa01e71b8>] cancel_userptr+0x208/0x250 [i915] [ 136.189342] [<ffffffff810c0824>] process_one_work+0x184/0x410 [ 136.189343] [<ffffffff810c0afe>] worker_thread+0x4e/0x480 [ 136.189344] [<ffffffff810c0ab0>] ? process_one_work+0x410/0x410 [ 136.189346] [<ffffffff810c6618>] kthread+0xd8/0xf0 [ 136.189348] [<ffffffff817e38bf>] ret_from_fork+0x1f/0x40 [ 136.189350] [<ffffffff810c6540>] ? kthread_worker_fn+0x180/0x180 [ 136.189351] ---[ end trace 694ecd151f58f1a6 ]--- Created attachment 125781 [details] [review] unbind userptr uninterruptibly Still has a binding, let's restore the uninterruptible unbind. Still hit the same warning unfortunately. A bit more information about the unbind fail perhaps? diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fdd7c0a12127..2138e5eea31d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2773,7 +2773,9 @@ int i915_vma_unbind(struct i915_vma *vma) GEM_BUG_ON(i915_vma_is_active(vma)); } - if (i915_vma_is_pinned(vma)) + if (WARN(i915_vma_is_pinned(vma), + "vma is still pinned [%d], flags=%x\n", + i915_vma_pin_count(vma), vma->flags)) return -EBUSY; if (!drm_mm_node_allocated(&vma->node)) For fun, also diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h index 8292e797d9b5..cec1aa4e1152 100644 --- a/drivers/gpu/drm/i915/i915_gem.h +++ b/drivers/gpu/drm/i915/i915_gem.h @@ -25,10 +25,6 @@ #ifndef __I915_GEM_H__ #define __I915_GEM_H__ -#ifdef CONFIG_DRM_I915_DEBUG_GEM -#define GEM_BUG_ON(expr) BUG_ON(expr) -#else -#define GEM_BUG_ON(expr) -#endif +#define GEM_BUG_ON(expr) WARN_ON(expr) #endif /* __I915_GEM_H__ */ Same thing again. Interestingly I never actually hit the WARN in vma_unbind. hmmm, if I add: diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 94fc051..8b66098 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -286,7 +286,7 @@ i915_gem_object_unbind(struct drm_i915_gem_object *obj) { struct i915_vma *vma; LIST_HEAD(still_in_list); - int ret; + int ret = -42; /* The vma will only be freed if it is marked as closed, and if we wait * upon rendering to the vma, we may unbind anything in the list. diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c index e20b653..06b3ed1 100644 --- a/drivers/gpu/drm/i915/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c @@ -77,6 +77,7 @@ static void cancel_userptr(struct work_struct *work) struct drm_i915_gem_object *obj = mo->obj; struct drm_device *dev = obj->base.dev; bool was_interruptible; + int ret; wait_rendering(obj); @@ -89,10 +90,13 @@ static void cancel_userptr(struct work_struct *work) to_i915(dev)->mm.interruptible = false; /* We are inside a kthread context and can't be interrupted */ - if (i915_gem_object_unbind(obj) == 0) + ret = i915_gem_object_unbind(obj); + if (ret == 0) __i915_gem_object_put_pages(obj); + WARN_ONCE(obj->mm.pages, - "Failed to release pages: bind_count=%d, pages_pin_count=%d, pin_display=%d\n", + "Failed to release pages: ret=%d, bind_count=%d, pages_pin_count=%d, pin_display=%d\n", + ret, obj->bind_count, atomic_read(&obj->mm.pages_pin_count), obj->pin_display); I get: Failed to release pages: ret=-42, bind_count=1, pages_pin_count=1, pin_display=0 Hmm, yup ret is not initialisated for the empty vma_list. diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fdd7c0a12127..8f7bc47e5f5d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -286,7 +286,7 @@ i915_gem_object_unbind(struct drm_i915_gem_object *obj) { struct i915_vma *vma; LIST_HEAD(still_in_list); - int ret; + int ret = 0; is a definite fix. But you have a bind_count != 0, you must have some vma in there. :| Ah. The obj_link is removed on i915_vma_close() not upon free. Time to think why. Iirc, my thinking was to remove it upon close so that it was unavailable for lookup immediately afterwards. Magic fix: index 1f63a45fd6b0..fa9486608ddf 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -279,12 +279,15 @@ static const struct drm_i915_gem_object_ops i915_gem_phys_ops = { .release = i915_gem_object_release_phys, }; -int -i915_gem_object_unbind(struct drm_i915_gem_object *obj) +int i915_gem_object_unbind(struct drm_i915_gem_object *obj) { struct i915_vma *vma; LIST_HEAD(still_in_list); - int ret = 0; + int ret; + + ret = i915_gem_object_wait_rendering(obj, false); + if (ret) + return ret; Not magic enough... More subtle fix: struct i915_vma *vma; LIST_HEAD(still_in_list); + unsigned long active; int ret = 0; + active = i915_gem_object_get_active(obj); + for_each_active(active, idx) { + ret = i915_gem_active_retire(&obj->last_read[idx], + &obj->base.dev->struct_mutex); + if (ret) + return ret; + } + Created attachment 125782 [details] [review] Third go Nice, that does indeed fix it. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.