Bug 104341 - WARN_ON(i915_gem_object_has_pinned_pages(obj))
Summary: WARN_ON(i915_gem_object_has_pinned_pages(obj))
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-19 19:29 UTC by Adric Blake
Modified: 2018-09-21 10:41 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features: GEM/Other


Attachments
bo->refcnt error details and related logs (2.21 MB, application/zip)
2017-12-20 03:33 UTC, Adric Blake
no flags Details

Description Adric Blake 2017-12-19 19:29:11 UTC
During (presumably) normal operation. At least the second time I've seen this. Other bug is marked resolved, so...

[232113.105048] arch_pc kernel: ------------[ cut here ]------------
[232113.105051] arch_pc kernel: WARN_ON(i915_gem_object_has_pinned_pages(obj))
[232113.105166] arch_pc kernel: WARNING: CPU: 0 PID: 18187 at drivers/gpu/drm/i915/i915_gem.c:4652 __i915_gem_free_objects+0x2a7/0x2c0 [i915]
[232113.105167] arch_pc kernel: Modules linked in: nls_iso8859_1 nls_cp437 vfat fat nls_utf8 hfsplus fuse xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_set iptable_filter ip_set_hash_net ip_set nfnetlink snd_hda_codec_idt arc4 snd_hda_codec_generic b43 rng_core bcma joydev mac80211 i915 mousedev dell_wmi snd_hda_intel snd_hda_codec snd_hwdep sparse_keymap snd_hda_core cfg80211 snd_pcm gpio_ich wmi_bmof input_leds dell_wmi_descriptor iTCO_wdt dell_laptop intel_agp mac_hid iTCO_vendor_support dell_smbios i2c_i801 evdev wmi led_class rfkill i2c_algo_bit drm_kms_helper dcdbas syscopyarea ssb sysfillrect sysimgblt fb_sys_fops pcmcia dell_smm_hwmon drm pcmcia_core coretemp mmc_core pcspkr snd_timer psmouse snd intel_gtt shpchp agpgart thermal ac lpc_ich soundcore battery button video acpi_cpufreq sch_fq_codel sg crypto_user ip_tables
[232113.105222] arch_pc kernel:  x_tables ext4 crc16 mbcache jbd2 fscrypto sd_mod sr_mod cdrom ums_realtek uas usb_storage serio_raw atkbd libps2 uhci_hcd ahci libahci libata scsi_mod ehci_pci ehci_hcd usbcore usb_common i8042 serio sky2
[232113.105245] arch_pc kernel: CPU: 0 PID: 18187 Comm: kworker/0:2 Tainted: G        W        4.15.0-rc3-15+rc3+988+gafc98c9fb9a8+723751 #1
[232113.105247] arch_pc kernel: Hardware name: Dell Inc. Inspiron 1545                   /0G848F, BIOS A14 12/07/2009
[232113.105279] arch_pc kernel: Workqueue: events __i915_gem_free_work [i915]
[232113.105311] arch_pc kernel: RIP: 0010:__i915_gem_free_objects+0x2a7/0x2c0 [i915]
[232113.105312] arch_pc kernel: RSP: 0000:ffffa8cec2cdfe30 EFLAGS: 00010282
[232113.105314] arch_pc kernel: RAX: 0000000000000000 RBX: ffff91a6fe44f600 RCX: ffffffff9ec55568
[232113.105315] arch_pc kernel: RDX: 0000000000000001 RSI: 0000000000000092 RDI: 0000000000000283
[232113.105317] arch_pc kernel: RBP: ffff91a6fe44f6f8 R08: 000000000000002e R09: 0000000000560e32
[232113.105318] arch_pc kernel: R10: ffffa8cec2cdfd50 R11: 0000000000000000 R12: ffff91a758cb0000
[232113.105319] arch_pc kernel: R13: ffff91a6e29a2400 R14: ffff91a6fe44f4f8 R15: ffff91a758cb0070
[232113.105321] arch_pc kernel: FS:  0000000000000000(0000) GS:ffff91a75fc00000(0000) knlGS:0000000000000000
[232113.105322] arch_pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[232113.105323] arch_pc kernel: CR2: 00007fbaac4f0ff0 CR3: 00000000243a0000 CR4: 00000000000406f0
[232113.105325] arch_pc kernel: Call Trace:
[232113.105362] arch_pc kernel:  __i915_gem_free_work+0x62/0x90 [i915]
[232113.105367] arch_pc kernel:  process_one_work+0x1de/0x410
[232113.105370] arch_pc kernel:  worker_thread+0x2b/0x3d0
[232113.105373] arch_pc kernel:  ? process_one_work+0x410/0x410
[232113.105375] arch_pc kernel:  kthread+0x111/0x130
[232113.105377] arch_pc kernel:  ? _kthread_create_worker_on_cpu+0x60/0x60
[232113.105381] arch_pc kernel:  ? do_group_exit+0x3a/0xa0
[232113.105384] arch_pc kernel:  ret_from_fork+0x1f/0x30
[232113.105386] arch_pc kernel: Code: ea 65 ff 0d dc 4e 7c 3f 0f 85 d9 fd ff ff e8 66 aa 7b dd e9 cf fd ff ff 48 c7 c6 20 d6 91 c0 48 c7 c7 30 a3 93 c0 e8 d9 5d 83 dd <0f> ff c7 83 c8 01 00 00 00 00 00 00 e9 7e fe ff ff 0f 1f 84 00 
[232113.105422] arch_pc kernel: ---[ end trace 2f0716295a440525 ]---

Arch Linux x86_64
linux-drm-tip-git 4.15+rc3+988+gafc98c9fb9a8+723751-1
    Build Date : Sat 16 Dec 2017 06:49:09 PM EST
xorg-server 1.19.5-1
mesa 17.3.0-2
libdrm 2.4.88-1
xf86-video-intel 2.99.917+802+gaf6d8e9e plus local revert of 21008aaa1f
Comment 1 Chris Wilson 2017-12-19 20:51:06 UTC
That warning doesn't tell us very much; just that there is a pinleak of the pages. Try drm.debug=0xe dmesg to see if there are any precursors.
Comment 2 Adric Blake 2017-12-20 03:33:38 UTC
Created attachment 136307 [details]
bo->refcnt error details and related logs

Not a capture of that warning (sorry!), but Xorg crashed with what might be a relevant assert, so attaching that and my boot log so far in case it's related.

[  4941.443] (EE) assert_bo_retired:325 assertion 'bo->refcnt' failed
Comment 3 Chris Wilson 2017-12-20 08:49:57 UTC
(In reply to Adric Blake from comment #2)
> Created attachment 136307 [details]
> bo->refcnt error details and related logs
> 
> Not a capture of that warning (sorry!), but Xorg crashed with what might be
> a relevant assert, so attaching that and my boot log so far in case it's
> related.
> 
> [  4941.443] (EE) assert_bo_retired:325 assertion 'bo->refcnt' failed

No, that's just the buffer cache tripping over itself. The refcnt check was there to optimistically prevent accidental use-after-free, but it also is being triggered on one path having just pulled the buffer out of the free cache.

commit 708255cbca83dbafe9154c39bd4553b0b81f09cf (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 20 08:46:43 2017 +0000

    sna: Drop redundant sync of an inactive snoop cache
    
    Having requested an inactive snooped buffer, we know that it is
    already flushed and do not need to flush it again. With debugging
    enabled, we hit an assert while flushing that the buffer has a refcout,
    which at this present time of being plucked from the snoop cache it does
    not.
    
    Reported-by: Adric Blake <promarbler14@gmail.com>
    References: https://bugs.freedesktop.org/show_bug.cgi?id=104341#c2
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 4 Adric Blake 2018-03-29 04:06:44 UTC
FYI, I haven't once seen this message again. No idea whether it was kernel or software changes that fixed it

linux-git 4.16rc5.r4.gfc6eabbbf8ef-1 (not drm-tip)
xorg-server 1.19.6+13+gd0d1a694f-1
mesa 17.3.7-1
libdrm 2.4.91-3
xf86-video-intel 1:2.99.917+814+gaa36399c-1
Comment 5 Jani Saarinen 2018-03-29 06:13:04 UTC
Do you think we can close this?
Comment 6 Jani Saarinen 2018-03-29 07:10:52 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 7 Adric Blake 2018-03-29 14:11:09 UTC
I have confidence that this can be closed.
Comment 8 Jani Saarinen 2018-03-29 14:20:34 UTC
I appreciate your feedback, resolving.
Comment 9 Adric Blake 2018-04-20 04:34:34 UTC
I've finally had it recur after about four months... unfortunately, no debug info since it was out of the blue.

Here's an unhelpful warning in case you can somehow discern a pattern.
[605072.893820] arch_pc kernel: ------------[ cut here ]------------
[605072.893824] arch_pc kernel: WARN_ON(i915_gem_object_has_pinned_pages(obj))
[605072.893951] arch_pc kernel: WARNING: CPU: 1 PID: 29874 at drivers/gpu/drm/i915/i915_gem.c:4668 __i915_gem_free_objects+0x2a4/0x2e0 [i915]
[605072.893952] arch_pc kernel: Modules linked in: ccm sctp libcrc32c crc32c_generic hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid fuse xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_set iptable_filter ip_set_hash_net ip_set nfnetlink arc4 b43 joydev bcma mousedev mac80211 snd_hda_codec_idt snd_hda_codec_generic cfg80211 rng_core iTCO_wdt gpio_ich iTCO_vendor_support dell_laptop wmi_bmof rfkill dell_wmi sparse_keymap snd_hda_intel dell_smbios i915 snd_hda_codec dell_wmi_descriptor dcdbas uvcvideo snd_hda_core snd_hwdep videobuf2_vmalloc dell_smm_hwmon videobuf2_memops snd_pcm coretemp snd_timer videobuf2_v4l2 ssb snd psmouse ums_realtek videobuf2_common i2c_algo_bit pcspkr uas videodev media soundcore drm_kms_helper input_leds led_class drm intel_agp mac_hid intel_gtt agpgart syscopyarea mmc_core sysfillrect i2c_i801
[605072.894013] arch_pc kernel:  pcmcia sysimgblt fb_sys_fops pcmcia_core lpc_ich shpchp wmi battery ac acpi_cpufreq evdev rtc_cmos sg crypto_user ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod usb_storage serio_raw atkbd libps2 uhci_hcd ahci libahci libata ehci_pci ehci_hcd scsi_mod usbcore usb_common i8042 serio sky2
[605072.894051] arch_pc kernel: CPU: 1 PID: 29874 Comm: kworker/1:0 Not tainted 4.16.2-1-ARCH #1
[605072.894052] arch_pc kernel: Hardware name: Dell Inc. Inspiron 1545                   /0G848F, BIOS A07 05/13/2009
[605072.894090] arch_pc kernel: Workqueue: events __i915_gem_free_work [i915]
[605072.894123] arch_pc kernel: RIP: 0010:__i915_gem_free_objects+0x2a4/0x2e0 [i915]
[605072.894124] arch_pc kernel: RSP: 0018:ffffb2568a97fe30 EFLAGS: 00010282
[605072.894127] arch_pc kernel: RAX: 0000000000000000 RBX: ffff9e1581fee700 RCX: 0000000000000001
[605072.894128] arch_pc kernel: RDX: 0000000080000001 RSI: 0000000000000086 RDI: 00000000ffffffff
[605072.894130] arch_pc kernel: RBP: ffff9e1581fee7f8 R08: 0000544b132ad098 R09: 00000000000004e8
[605072.894131] arch_pc kernel: R10: fffff22bc103b2c0 R11: 0000000000000000 R12: ffff9e1698ee8000
[605072.894133] arch_pc kernel: R13: fffffffffffffed8 R14: ffff9e1581fee5f8 R15: ffff9e1698ee8070
[605072.894135] arch_pc kernel: FS:  0000000000000000(0000) GS:ffff9e169fd00000(0000) knlGS:0000000000000000
[605072.894136] arch_pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[605072.894138] arch_pc kernel: CR2: 00007f6205085000 CR3: 00000000c448c000 CR4: 00000000000406e0
[605072.894139] arch_pc kernel: Call Trace:
[605072.894178] arch_pc kernel:  __i915_gem_free_work+0x62/0x90 [i915]
[605072.894184] arch_pc kernel:  process_one_work+0x1ce/0x3f0
[605072.894188] arch_pc kernel:  worker_thread+0x2b/0x3d0
[605072.894191] arch_pc kernel:  ? process_one_work+0x3f0/0x3f0
[605072.894193] arch_pc kernel:  kthread+0x113/0x130
[605072.894196] arch_pc kernel:  ? kthread_create_on_node+0x70/0x70
[605072.894199] arch_pc kernel:  ? do_syscall_64+0x74/0x190
[605072.894202] arch_pc kernel:  ? SyS_exit_group+0x10/0x10
[605072.894206] arch_pc kernel:  ret_from_fork+0x35/0x40
[605072.894209] arch_pc kernel: Code: e6 fd ff ff e8 55 d5 98 ed e9 dc fd ff ff e8 d4 fd ec ed e9 e1 fe ff ff 48 c7 c6 10 ce 75 c0 48 c7 c7 49 db 74 c0 e8 ac 6e a0 ed <0f> 0b c7 83 c8 01 00 00 00 00 00 00 e9 84 fe ff ff 48 8b 45 00 
[605072.894255] arch_pc kernel: ---[ end trace 97e4fd59c33fcb3b ]---

Maybe it's caused by slowness? I believe I was doing heavy I/O at the time, but I can't be certain.

Linux arch_pc 4.16.2-1-ARCH #1 SMP PREEMPT Thu Apr 12 13:51:26 UTC 2018 x86_64 GNU/Linux
xorg-server 1.19.6+13+gd0d1a694f-2
mesa 18.0.0-4 (upgraded to 18.0.1-1 about an hour prior but hadn't restarted.)
libdrm 2.4.91-3
xf86-video-intel 1:2.99.917+827+g02dfb919-1
Comment 10 Jani Saarinen 2018-04-24 06:44:18 UTC
Chris, any comments here?
Comment 11 Adric Blake 2018-06-16 01:32:32 UTC
Reoccurred today with kernel 4.17. It seems that the warning occurs completely randomly, independent of userspace. Unfortunately I still am unsure how to debug this without bogging down my system for over a month.
Comment 12 Adric Blake 2018-07-14 19:07:27 UTC
Happened once again, and while scrolling an innocent page in firefox. This time, I did have considerable memory pressure and was pushing into swap. Maybe part of the process is getting interrupted by memory reclamation/maintenance? I would figure that such operations would be uninterruptible, though.
Comment 13 Adric Blake 2018-07-30 16:44:17 UTC
Rather than something so esoteric, it could just be a side-effect of bug 103025. Can't be sure that they're related, but looking at the kernel sources where that bug seems to occur does suggest the possibility.
Comment 14 Lakshmi 2018-09-14 09:56:51 UTC
Adric can you attach latest teh latest dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M.

Apart from seeing this warning, do you see any other issues due to this warning? Trying to understand what is the impact of this warning.
Comment 15 Adric Blake 2018-09-14 11:16:44 UTC
I do not see any side effects aside from the warning. And getting drm debugging info will likely take weeks, even with drm.debug constantly on. I can't say whether the bug even affects kernel 4.18.

For now, I suppose you can resolve this as working and I'll reopen if I can capture the bug.
Comment 16 Lakshmi 2018-09-21 10:41:50 UTC
Closing as works for me. 
Warning here implies that memory is locked due to pinning of pages.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.