Created attachment 112260 [details] dmesg ==System Environment== -------------------------- Regression: yes good commit: 985850bd145655d10dfcd5a03a3fc38540794ca7 bad commit:cd52471999ef08bca35568525c1d85d883ffedb6 Non-working platforms: all ==kernel== -------------------------- drm-intel-nightly/9c4bdce37d09c0682f04bb5e6d0567def5c8d786 commit 9c4bdce37d09c0682f04bb5e6d0567def5c8d786 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jan 13 23:27:51 2015 +0100 drm-intel-nightly: 2015y-01m-13d-22h-27m-23s UTC integration manifest ==Bug detailed description== ----------------------------- It causes system hang on drm-intel-nightly and drm-intel-next-queued kernel, works well on drm-intel-fixes kernel. output: unbinding /sys/class/vtconsole/vtcon0/: (M) frame buffer device dmesg [ 47.332841] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [ 47.332847] IP: [<ffffffff8110e90b>] kmem_cache_alloc+0xd9/0x113 [ 47.332853] PGD 1195a5067 PUD 11937d067 PMD 0 [ 47.332856] Oops: 0000 [#1] SMP [ 47.332860] Modules linked in: netconsole configfs ip6table_filter ip6_tables ipv6 iptable_filter ip_tables ebtable_nat ebtables x_tables dm_mod snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support dcdbas i2c_i801 pcspkr serio_raw snd_hda_controller snd_hda_codec snd_hwdep lpc_ich shpchp mfd_core snd_pcm snd_timer snd soundcore battery acpi_cpufreq i915(-) button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea [last unloaded: snd_hda_intel] [ 47.332896] CPU: 3 PID: 1078 Comm: kworker/u16:3 Not tainted 3.19.0-rc4_drm-intel-nightly_9c4bdc_20150114+ #410 [ 47.332899] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A03 09/17/2013 [ 47.332904] Workqueue: khelper __call_usermodehelper [ 47.332906] task: ffff880119690000 ti: ffff880114490000 task.ti: ffff880114490000 [ 47.332908] RIP: 0010:[<ffffffff8110e90b>] [<ffffffff8110e90b>] kmem_cache_alloc+0xd9/0x113 [ 47.332912] RSP: 0018:ffff880114493c18 EFLAGS: 00010282 [ 47.332926] RAX: 0000000000000000 RBX: ffff880115d59de0 RCX: 0000000000001282 [ 47.332928] RDX: 0000000000001281 RSI: 00000000000000d0 RDI: ffff88011a003900 [ 47.332930] RBP: 0000000000000001 R08: 0000000000015520 R09: ffffffff8104a2ae [ 47.332932] R10: ffff8800db07f4d8 R11: ffffffff81bd1d18 R12: ffff88011a003900 [ 47.332934] R13: 00000000000000d0 R14: ffffffff8104f7cb R15: 0000000000800111 [ 47.332936] FS: 0000000000000000(0000) GS:ffff88011eac0000(0000) knlGS:0000000000000000 [ 47.332938] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 47.332940] CR2: 0000000000000001 CR3: 00000001167d0000 CR4: 00000000001407e0 [ 47.332994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 47.332996] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 47.332998] Stack: [ 47.332999] 0000000000000000 ffff880115d59de0 ffffffff81bc1e00 0000000000000000 [ 47.333003] ffff8800025e4000 ffff8800daecc000 0000000000800111 ffffffff8104f7cb [ 47.333007] ffff880114493d54 ffffffff8110e859 0000000000000000 0000000000000000 [ 47.333011] Call Trace: [ 47.333014] [<ffffffff8104f7cb>] ? alloc_pid+0x22/0x3b7 [ 47.333017] [<ffffffff8110e859>] ? kmem_cache_alloc+0x27/0x113 [ 47.333020] [<ffffffff8110e859>] ? kmem_cache_alloc+0x27/0x113 [ 47.333024] [<ffffffff8103a948>] ? copy_process.part.36+0xd98/0x160a [ 47.333027] [<ffffffff8104a2ae>] ? __call_usermodehelper+0x3e/0x3e [ 47.333030] [<ffffffff8103b337>] ? do_fork+0xb4/0x2e0 [ 47.333032] [<ffffffff8103b581>] ? kernel_thread+0x1e/0x20 [ 47.333035] [<ffffffff8104a29f>] ? __call_usermodehelper+0x2f/0x3e [ 47.333038] [<ffffffff8104d108>] ? process_one_work+0x1ad/0x31a [ 47.333041] [<ffffffff8104d4ef>] ? worker_thread+0x255/0x350 [ 47.333044] [<ffffffff8104d29a>] ? process_scheduled_works+0x25/0x25 [ 47.333046] [<ffffffff81050dc2>] ? kthread+0xc5/0xcd [ 47.333049] [<ffffffff81050cfd>] ? kthread_freezable_should_stop+0x40/0x40 [ 47.333053] [<ffffffff8179ef2c>] ? ret_from_fork+0x7c/0xb0 [ 47.333056] [<ffffffff81050cfd>] ? kthread_freezable_should_stop+0x40/0x40 [ 47.333058] Code: e9 4d 89 f8 4c 89 f1 48 89 ea 48 89 de 41 ff 14 24 49 83 c4 10 49 83 3c 24 00 eb 36 eb 38 49 63 44 24 20 4d 8b 04 24 48 8d 4a 01 <48> 8b 5c 05 00 48 89 e8 65 49 0f c7 08 0f 94 c0 84 c0 0f 85 78 [ 47.333098] RIP [<ffffffff8110e90b>] kmem_cache_alloc+0xd9/0x113 [ 47.333101] RSP <ffff880114493c18> [ 47.333102] CR2: 0000000000000001 [ 47.333105] ---[ end trace 0e850f2f346708bd ]--- ==Reproduce steps== ---------------------------- 1. ./drv_module_reload
Big-bada-boom. Please, please bisect.
ea2c67bb4affa84080c616920f3899f123786e56 is the first bad commit commit ea2c67bb4affa84080c616920f3899f123786e56 Author: Matt Roper <matthew.d.roper@intel.com> Date: Tue Dec 23 10:41:52 2014 -0800 drm/i915: Move to atomic plane helpers (v9) Switch plane handling to use the atomic plane helpers. This means that rather than provide our own implementations of .update_plane() and .disable_plane(), we expose the lower-level check/prepare/commit/cleanup entrypoints and let the DRM core implement update/disable for us using those entrypoints. The other main change that falls out of this patch is that our drm_plane's will now always have a valid plane->state that contains the relevant plane state (initial state is allocated at plane creation). The base drm_plane_state pointed to holds the requested source/dest coordinates, and the subclassed intel_plane_state holds the adjusted values that our driver actually uses. v2: - Renamed file from intel_atomic.c to intel_atomic_plane.c (Daniel) - Fix a copy/paste comment mistake (Bob) v3: - Use prepare/cleanup functions that we've already factored out - Use newly refactored pre_commit/commit/post_commit to avoid sleeping during vblank evasion v4: - Rebase to latest di-nightly requires adding an 'old_state' parameter to atomic_update; v5: - Must have botched a rebase somewhere and lost some work. Restore state 'dirty' flag to let begin/end code know which planes to run the pre_commit/post_commit hooks for. This would have actually shown up as broken in the next commit rather than this one. v6: - Squash kerneldoc patch into this one. - Previous patches have now already taken care of most of the infrastructure that used to be in this patch. All we're adding here now is some thin wrappers. v7: - Check return of intel_plane_duplicate_state() for allocation failures. v8: - Drop unused drm_plane_state -> intel_plane_state cast. (Ander) - Squash in actual transition to plane helpers. Significant refactoring earlier in the patchset has made the combined prep+transition much easier to swallow than it was in earlier iterations. (Ander) v9: - s/track_fbs/disabled_planes/ in the atomic crtc flags. The only fb's we need to update frontbuffer tracking for are those on a plane about to be disabled (since the atomic helpers never call prepare_fb() when disabling a plane), so the new name more accurately describes what we're actually tracking. Testcase: igt/kms_plane Testcase: igt/kms_universal_plane Testcase: igt/kms_cursor_crc Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> :040000 040000 a2d045dfd034bfb9ec7a5ffb799fa148ff4047c3 1f7f88d3e9fe9cbe7b8409c1d868 52c65808d7a3 M Documentation :040000 040000 7660ab19978e64dac501827a5b4b1e050e88b529 68cd800423301596e3245b0410dc 1d1a19fc3b35 M drivers
(In reply to Li Xu from comment #2) > ea2c67bb4affa84080c616920f3899f123786e56 is the first bad commit > commit ea2c67bb4affa84080c616920f3899f123786e56 > Author: Matt Roper <matthew.d.roper@intel.com> > Date: Tue Dec 23 10:41:52 2014 -0800 > > drm/i915: Move to atomic plane helpers (v9) > commit ea2c67bb4affa84080c616920f3899f123786e56 Author: Matt Roper <matthew.d.roper@intel.com> AuthorDate: Tue Dec 23 10:41:52 2014 -0800 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Mon Jan 12 23:59:31 2015 +0100 drm/i915: Move to atomic plane helpers (v9)
Please always assign to bisected bad commit author! I'm also CC'ing Ander for being the reviewer. At a very quick glance, only intel_plane_duplicate_state looks suspicious. Is plane->state always valid when non-NULL? Matt, Ander?
(In reply to Jani Nikula from comment #4) > Please always assign to bisected bad commit author! I'm also CC'ing Ander > for being the reviewer. > > At a very quick glance, only intel_plane_duplicate_state looks suspicious. > Is plane->state always valid when non-NULL? > > Matt, Ander? It should, but maybe I missed something in the review. But the error checking in intel_plane_duplicate_state() and callers seems to be correct. Perhaps the bug is in the plane helpers.
(In reply to Ander Conselvan de Oliveira from comment #5) > (In reply to Jani Nikula from comment #4) > > Please always assign to bisected bad commit author! I'm also CC'ing Ander > > for being the reviewer. > > > > At a very quick glance, only intel_plane_duplicate_state looks suspicious. > > Is plane->state always valid when non-NULL? > > > > Matt, Ander? > > It should, but maybe I missed something in the review. But the error > checking in intel_plane_duplicate_state() and callers seems to be correct. > Perhaps the bug is in the plane helpers. Actually the problem is somewhere with intel_plane_destroy. intel_plane_state_destroy doesn't set the plane to NULL and then something bad happens when drm_plane_cleanup tries to destroy it again.
(In reply to Ander Conselvan de Oliveira from comment #6) > (In reply to Ander Conselvan de Oliveira from comment #5) > > (In reply to Jani Nikula from comment #4) > > > Please always assign to bisected bad commit author! I'm also CC'ing Ander > > > for being the reviewer. > > > > > > At a very quick glance, only intel_plane_duplicate_state looks suspicious. > > > Is plane->state always valid when non-NULL? > > > > > > Matt, Ander? > > > > It should, but maybe I missed something in the review. But the error > > checking in intel_plane_duplicate_state() and callers seems to be correct. > > Perhaps the bug is in the plane helpers. > > Actually the problem is somewhere with intel_plane_destroy. > intel_plane_state_destroy doesn't set the plane to NULL and then something > bad happens when drm_plane_cleanup tries to destroy it again. Yep, Ander's analysis looks correct. When I first wrote the patchset, the DRM core wasn't cleaning up plane state, so I had to call the state destruction in intel_plane_destroy() so that we wouldn't leak it on driver unload. But the core got updated to do the destruction before my patches actually landed and I didn't notice before this series got merged, so we're now doing a double kfree() (and possibly a double framebuffer unreference as well). The fix is to just not cleanup the plane state in the driver anymore since the core will handle it for us. I'll send a patch for that shortly.
The following patch should solve the problem: [PATCH] drm/i915: Don't cleanup plane state in intel_plane_destroy()
(In reply to Matt Roper from comment #8) > The following patch should solve the problem: > > [PATCH] drm/i915: Don't cleanup plane state in intel_plane_destroy() Also available at http://patchwork.freedesktop.org/patch/40566/ - please test.
(In reply to Jani Nikula from comment #9) > (In reply to Matt Roper from comment #8) > > The following patch should solve the problem: > > > > [PATCH] drm/i915: Don't cleanup plane state in intel_plane_destroy() > > Also available at http://patchwork.freedesktop.org/patch/40566/ - please > test. Apply this patch on the latest -nightly kernel, the hang goes away. output shows "module successfully unloaded" but not reload, report bug 88573 to track it. I think your patch fixed this bug.
commit fd7bf8b626e1f9cfb9cd5c3104dcedae2569899b Author: Matt Roper <matthew.d.roper@intel.com> Date: Fri Jan 16 07:25:24 2015 -0800 drm/i915: Don't cleanup plane state in intel_plane_destroy()
Verified.Fixed on the latest -nightly kernel. [root@x-hsw27 tests]# ./drv_module_reload unbinding /sys/class/vtconsole/vtcon0/: (M) frame buffer device module successfully unloaded module successfully loaded again
Closing old verified+fixed
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.