Created attachment 96341 [details] dmesg System Environment: -------------------------- Platform: Haswell Kernel: drm-intel-fixes/0f4706d2740f2a221cd502922b22e522009041d9 Bug detailed description: ----------------------------- It causes call trace on all platforms with -nightly, -fixes and -queued kernel.Test on earlier commit, it also has this issue. Call trace: [ 200.944162] ------------[ cut here ]------------ [ 200.944258] WARNING: CPU: 7 PID: 4296 at fs/sysfs/group.c:216 device_del+0x39/0x16a() [ 200.944406] sysfs group ffffffff81af86d0 not found for kobject 'i2c-6' [ 200.944500] Modules linked in: i915(-) drm_kms_helper drm ipv6 dm_mod snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi pcspkr serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support lpc_ich mfd_core snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore battery tpm_infineon tpm_tis tpm wmi acpi_cpufreq video button [last unloaded: snd_hda_intel] [ 200.945759] CPU: 7 PID: 4296 Comm: rmmod Tainted: G W 3.14.0-rc5_drm-intel-fixes_0f4706_20140323+ #950 [ 200.945906] Hardware name: ASUS All Series/Z87-EXPERT, BIOS 1008 05/17/2013 [ 200.945998] 0000000000000000 0000000000000009 ffffffff81716b43 ffff880251395be8 [ 200.946256] ffffffff81035052 ffffffff00000001 ffffffff81379b9f ffffffff00000001 [ 200.946514] ffff880252747c00 0000000000000002 ffff88025269e150 0000000000000000 [ 200.946773] Call Trace: [ 200.946861] [<ffffffff81716b43>] ? dump_stack+0x41/0x51 [ 200.946950] [<ffffffff81035052>] ? warn_slowpath_common+0x73/0x8b [ 200.947040] [<ffffffff81379b9f>] ? device_del+0x39/0x16a [ 200.947128] [<ffffffff81035102>] ? warn_slowpath_fmt+0x45/0x4a [ 200.947217] [<ffffffff81379b9f>] ? device_del+0x39/0x16a [ 200.947306] [<ffffffff81379cd9>] ? device_unregister+0x9/0x12 [ 200.947395] [<ffffffff81379d16>] ? device_destroy+0x34/0x3a [ 200.947485] [<ffffffff816212d6>] ? i2cdev_detach_adapter+0x3e/0x42 [ 200.947574] [<ffffffff8171ef87>] ? notifier_call_chain+0x2e/0x59 [ 200.947665] [<ffffffff8104e49b>] ? __blocking_notifier_call_chain+0x43/0x5d [ 200.947756] [<ffffffff81379b97>] ? device_del+0x31/0x16a [ 200.947846] [<ffffffff81379cd9>] ? device_unregister+0x9/0x12 [ 200.947936] [<ffffffff81620d27>] ? i2c_del_adapter+0x190/0x1d6 [ 200.948032] [<ffffffffa02e557a>] ? intel_dp_encoder_destroy+0x1d/0x62 [i915] [ 200.948126] [<ffffffffa01d220e>] ? drm_mode_config_cleanup+0x2d/0x216 [drm] [ 200.948220] [<ffffffffa01ceb54>] ? drm_sysfs_connector_remove+0x74/0x80 [drm] [ 200.948364] [<ffffffffa02daf6f>] ? intel_modeset_cleanup+0xd1/0xe0 [i915] [ 200.948456] [<ffffffffa02af96c>] ? i915_driver_unload+0xb6/0x2a0 [i915] [ 200.948548] [<ffffffffa01cc127>] ? drm_dev_unregister+0x21/0x88 [drm] [ 200.948641] [<ffffffffa01cc8fe>] ? drm_put_dev+0x48/0x51 [drm] [ 200.948730] [<ffffffff812f8cee>] ? pci_device_remove+0x38/0x80 [ 200.948819] [<ffffffff8137c23d>] ? __device_release_driver+0x82/0xdb [ 200.948910] [<ffffffff8137c914>] ? driver_detach+0x6e/0x9a [ 200.948998] [<ffffffff8137c0a2>] ? bus_remove_driver+0x60/0x7e [ 200.949087] [<ffffffff812f8e57>] ? pci_unregister_driver+0x17/0x75 [ 200.949177] [<ffffffffa01cdec9>] ? drm_pci_exit+0x3c/0xa0 [drm] [ 200.949268] [<ffffffff8107e5c3>] ? SyS_delete_module+0x123/0x199 [ 200.949358] [<ffffffff8171c4b2>] ? page_fault+0x22/0x30 [ 200.949447] [<ffffffff817211a2>] ? system_call_fastpath+0x16/0x1b [ 200.949536] ---[ end trace 32f16d9b1d6381ab ]--- Reproduce steps: ---------------------------- 1. ./drv_module_reload
Please bisect.
Is this still an issue? And I think you need to look harder, this was definitely working on older kernels. If it is still and issue please analyze and provide a bisect.
The first bad commit:8f6599da8e772fa8de54cdf98e9e03cbaf3946da is the first bad commit. Same as https://bugs.freedesktop.org/show_bug.cgi?id=71208#c5 commit 8f6599da8e772fa8de54cdf98e9e03cbaf3946da Author: David Herrmann <dh.herrmann@gmail.com> AuthorDate: Sun Oct 20 18:55:45 2013 +0200 Commit: Dave Airlie <airlied@redhat.com> CommitDate: Wed Nov 6 14:53:25 2013 +1000 drm: delay minor destruction to drm_dev_free() Instead of freeing minors in drm_dev_unregister(), we only unplug them and delay the free to drm_dev_free(). Note that if drm_dev_register() has never been called, minors are NULL and this has no effect. This change is needed to allow early device unregistration. If we want to call drm_dev_unregister() on live devices, we need to guarantee that minors are still valid (but unplugged). This way, any open file can still access file_priv->minor->dev to get the DRM device. However, the minor is unplugged so no new users can occur. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Tbh I'm a bit confused. Adding David Herrmann for insight ...
I am confused by the bisect. The commit in questions just delays a free() so my first guess was some i915 code checks for "dev->primary != NULL" and then does a deregistration, while it should rather check for "dev->primary->kdev != NULL". On the other hand, no-one should check for that at all and just expect them to be there during ->unload(). But then again, looking at the backtrace, we're currently in the i915 ->unload() path, so the code modified by the commit hasn't even be called, yet. Furthermore, the warning happens _deep_ down the i2c chain (unregistering i2c-devices on top of i2c-adapters). A few things that bug me: * this code is 1/2 a year old, why does this warning show up only _now_? * the bisected commit breaks module re-loading, but that was already fixed * the code in question hasn't even been called, yet My suspicion is that you mixed up two different calltraces. Your comment "Same as https://bugs.freedesktop.org/show_bug.cgi?id=71208#c5" is definitely wrong. The stack-traces in that bug are on the devices created by DRM, unlike this trace which is in the i2c layer. Can you please bisect again _starting_ at least after this: commit a3483353ca4e6dbeef2ed62ebed01af109b5b27a Author: David Herrmann <dh.herrmann@gmail.com> Date: Wed Nov 13 11:42:26 2013 +0100 drm: check for !kdev in drm_unplug_minor() And please verify the stack-traces contain "i2cdev_detach_adapter".
Set NEEDINFO otherwise our QA wont take action.
Bisect it again. db31af1d4e815e141295b0bdf8da3e77885001d5 is the first bad commit commit db31af1d4e815e141295b0bdf8da3e77885001d5 Author: Jani Nikula <jani.nikula@intel.com> Date: Fri Nov 8 16:48:53 2013 +0200 drm/i915: clean up backlight conditional build I've always felt the backlight device conditional build has been all backwards. Make it feel right. Gently move things towards connector based stuff while at it. There should be no functional changes. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(In reply to comment #7) > Bisect it again. > db31af1d4e815e141295b0bdf8da3e77885001d5 is the first bad commit > commit db31af1d4e815e141295b0bdf8da3e77885001d5 > Author: Jani Nikula <jani.nikula@intel.com> > Date: Fri Nov 8 16:48:53 2013 +0200 > > drm/i915: clean up backlight conditional build > There was a related fix after this commit, so I think it's a more recent regression. Could you do - yet another - bisect between commit 931c1c26983b4f84e33b78579fc8d57e4a14c6b4 Author: Imre Deak <imre.deak@intel.com> Date: Tue Feb 11 17:12:51 2014 +0200 drm/i915: sdvo: add i2c sysfs symlink to the connector's directory and current -nightly?
(In reply to comment #8) > (In reply to comment #7) > > Bisect it again. > > db31af1d4e815e141295b0bdf8da3e77885001d5 is the first bad commit > > commit db31af1d4e815e141295b0bdf8da3e77885001d5 > > Author: Jani Nikula <jani.nikula@intel.com> > > Date: Fri Nov 8 16:48:53 2013 +0200 > > > > drm/i915: clean up backlight conditional build > > > > There was a related fix after this commit, so I think it's a more recent > regression. Could you do - yet another - bisect between > > commit 931c1c26983b4f84e33b78579fc8d57e4a14c6b4 > Author: Imre Deak <imre.deak@intel.com> > Date: Tue Feb 11 17:12:51 2014 +0200 > > drm/i915: sdvo: add i2c sysfs symlink to the connector's directory > > and current -nightly? Selected 931c1c26983b4f84e33b78579fc8d57e4a14c6b4 as good commit, many commits fail with "./drv_module_reload: line 43: /sys/class/vtconsole/vtcon1/bind: No such file or directory", skipped these commits. There are only 'skip'ped commits left to test. The first bad commit could be any of: 2eb4c7b1e7f275fe833aabe0a251b8e3f767fb08 3ae471f73a1d581e078b5b06d08d7b82833a093f fc275a74eb816c12d4fc226344e734872ed0b2f9 2e9a3fc3a360ac180f5b4c3c4416a0d0dec60dd8 6ae668cc19e8b18df28cd67b3448d9abd79284a4 71c68c4fc9bdcd6e46107a0f40b50a523f3b4fe0 7288ca07b638db485abec5752bd6b1faed1c33ef 9e541466eed411cb5462fa9e6181c4d409e7e2ef
(In reply to comment #9) > There are only 'skip'ped commits left to test. > The first bad commit could be any of: > 2eb4c7b1e7f275fe833aabe0a251b8e3f767fb08 > 3ae471f73a1d581e078b5b06d08d7b82833a093f > fc275a74eb816c12d4fc226344e734872ed0b2f9 > 2e9a3fc3a360ac180f5b4c3c4416a0d0dec60dd8 > 6ae668cc19e8b18df28cd67b3448d9abd79284a4 > 71c68c4fc9bdcd6e46107a0f40b50a523f3b4fe0 > 7288ca07b638db485abec5752bd6b1faed1c33ef > 9e541466eed411cb5462fa9e6181c4d409e7e2ef All these commits are for drm/i2c/tda998x which we don't use at all for our driver. I suspect something has gone wrong with the bisect, can you please double-check?
(In reply to comment #10) > (In reply to comment #9) > > There are only 'skip'ped commits left to test. > > The first bad commit could be any of: > > 2eb4c7b1e7f275fe833aabe0a251b8e3f767fb08 > > 3ae471f73a1d581e078b5b06d08d7b82833a093f > > fc275a74eb816c12d4fc226344e734872ed0b2f9 > > 2e9a3fc3a360ac180f5b4c3c4416a0d0dec60dd8 > > 6ae668cc19e8b18df28cd67b3448d9abd79284a4 > > 71c68c4fc9bdcd6e46107a0f40b50a523f3b4fe0 > > 7288ca07b638db485abec5752bd6b1faed1c33ef > > 9e541466eed411cb5462fa9e6181c4d409e7e2ef > > All these commits are for drm/i2c/tda998x which we don't use at all for our > driver. I suspect something has gone wrong with the bisect, can you please > double-check? I will bisect it again. good commit:b2040f6fed736ccd2319768bc59833abe74148b8 bad commit:33688d95c458ffca6b247189cc6f15277fd6abf0
Bisect shows: 1c61eae469e0d1d2fb9d7b77f51ca50c1f8f3ce9 is the first bad commit commit 1c61eae469e0d1d2fb9d7b77f51ca50c1f8f3ce9 Author: Christian König <christian.koenig@amd.com> Date: Tue Feb 18 01:50:22 2014 -0700 drm/radeon: fix CP semaphores on CIK The CP semaphore queue on CIK has a bug that triggers if uncompleted waits use the same address while a signal is still pending. Work around this by using different addresses for each sync. Signed-off-by: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org
(In reply to comment #12) > Bisect shows: 1c61eae469e0d1d2fb9d7b77f51ca50c1f8f3ce9 is the first bad > commit > commit 1c61eae469e0d1d2fb9d7b77f51ca50c1f8f3ce9 > Author: Christian König <christian.koenig@amd.com> > Date: Tue Feb 18 01:50:22 2014 -0700 > > drm/radeon: fix CP semaphores on CIK I don't think there's any way this could be the culprit. Due to multiple different bisect results, I suspect the issue you're seeing occurs sometimes, but not always, so you can't rely on one good test result only for bisection.
It passes 5 in 5 runs on commit:b2040f6fed7 It fails 5 in 5 runs on commit:33688d95c45
(In reply to comment #14) > It passes 5 in 5 runs on commit:b2040f6fed7 > It fails 5 in 5 runs on commit:33688d95c45 $ git log --oneline b2040f6fed7..33688d95c45 | wc -l 339 Please bisect into these two.
(In reply to comment #15) > (In reply to comment #14) > > It passes 5 in 5 runs on commit:b2040f6fed7 > > It fails 5 in 5 runs on commit:33688d95c45 > > $ git log --oneline b2040f6fed7..33688d95c45 | wc -l > 339 > > Please bisect into these two. Comment 12's bisect result is between these 2 commits
(In reply to comment #16) > (In reply to comment #15) > > (In reply to comment #14) > > > It passes 5 in 5 runs on commit:b2040f6fed7 > > > It fails 5 in 5 runs on commit:33688d95c45 > > > > $ git log --oneline b2040f6fed7..33688d95c45 | wc -l > > 339 > > > > Please bisect into these two. > > Comment 12's bisect result is between these 2 commits Maybe, but it's a change in Radeon code, not our code. I don't believe the result is correct.
(In reply to comment #14) > It passes 5 in 5 runs on commit:b2040f6fed7 > It fails 5 in 5 runs on commit:33688d95c45 Retest on commit b2040f6fed7, it also causes call trace.
Re-bisect it,24b9bf43e93e0edd89072da51cf1fab95fc69dec is the first bad commit commit 24b9bf43e93e0edd89072da51cf1fab95fc69dec Author: Nikolay Aleksandrov <nikolay@redhat.com> Date: Mon Mar 3 23:19:18 2014 +0100 net: fix for a race condition in the inet frag code I stumbled upon this very serious bug while hunting for another one, it's a very subtle race condition between inet_frag_evictor, inet_frag_intern and the IPv4/6 frag_queue and expire functions (basically the users of inet_frag_kill/inet_frag_put). What happens is that after a fragment has been added to the hash chain but before it's been added to the lru_list (inet_frag_lru_add) in inet_frag_intern, it may get deleted (either by an expired timer if the system load is high or the timer sufficiently low, or by the fraq_queue function for different reasons) before it's added to the lru_list, then after it gets added it's a matter of time for the evictor to get to a piece of memory which has been freed leading to a number of different bugs depending on what's left there. I've been able to trigger this on both IPv4 and IPv6 (which is normal as the frag code is the same), but it's been much more difficult to trigger on IPv4 due to the protocol differences about how fragments are treated. Revert this commit, new warning and call trace appears: [ 1.357371] ------------[ cut here ]------------ [ 1.357376] WARNING: CPU: 0 PID: 1230 at drivers/gpu/drm/drm_modes.c:119 drm_mode_probed_add+0x27/0x41 [drm]() [ 1.357376] Modules linked in: firewire_ohci(+) firewire_core crc_itu_t i915(+) video drm_kms_helper drm floppy button [ 1.357381] CPU: 0 PID: 1230 Comm: systemd-udevd Tainted: G W 3.14.0-rc7_queued_revert_24b9bf43e_20140429+ #1 [ 1.357382] Hardware name: Gigabyte Technology Co., Ltd. H55M-UD2H/H55M-UD2H, BIOS F4 12/02/2009 [ 1.357383] 0000000000000000 0000000000000009 ffffffff81716de3 0000000000000000 [ 1.357385] ffffffff81035052 ffff88003734f000 ffffffffa0029754 0000000000004ba5 [ 1.357386] ffff880111359300 ffff8800d368ec00 ffff8800d35a1500 0000000000004ba5 [ 1.357388] Call Trace: [ 1.357390] [<ffffffff81716de3>] ? dump_stack+0x41/0x51 [ 1.357392] [<ffffffff81035052>] ? warn_slowpath_common+0x73/0x8b [ 1.357396] [<ffffffffa0029754>] ? drm_mode_probed_add+0x27/0x41 [drm] [ 1.357400] [<ffffffffa0029754>] ? drm_mode_probed_add+0x27/0x41 [drm] [ 1.357403] [<ffffffffa002c45f>] ? drm_add_edid_modes+0x2d6/0xd02 [drm] [ 1.357408] [<ffffffffa002575f>] ? drm_mode_object_get+0x51/0x60 [drm] [ 1.357423] [<ffffffffa00b7790>] ? intel_connector_update_modes+0x1c/0x36 [i915] [ 1.357425] [<ffffffff8171b390>] ? mutex_lock+0x9/0x25 [ 1.357441] [<ffffffffa00c0b1c>] ? intel_crt_ddc_get_modes+0x21/0x3c [i915] [ 1.357458] [<ffffffffa00c0b7d>] ? intel_crt_get_modes+0x46/0x8a [i915] [ 1.357471] [<ffffffffa005f5b3>] ? drm_helper_probe_single_connector_modes+0x138/0x2d2 [drm_kms_helper] [ 1.357475] [<ffffffffa0060318>] ? drm_fb_helper_probe_connector_modes+0x38/0x4c [drm_kms_helper] [ 1.357477] [<ffffffffa006127c>] ? drm_fb_helper_initial_config+0x1ab/0x450 [drm_kms_helper] [ 1.357480] [<ffffffff810d9b6d>] ? kmem_cache_alloc+0x23/0xac [ 1.357497] [<ffffffffa00a1374>] ? gen5_write32+0x21/0x47 [i915] [ 1.357514] [<ffffffffa0096bb7>] ? ibx_display_interrupt_update+0x91/0xb4 [i915] [ 1.357531] [<ffffffffa00a1374>] ? gen5_write32+0x21/0x47 [i915] [ 1.357553] [<ffffffffa00d8865>] ? i915_driver_load+0xbad/0xe1e [i915] [ 1.357560] [<ffffffffa002049f>] ? drm_dev_register+0x74/0xe7 [drm] [ 1.357565] [<ffffffffa0022729>] ? drm_get_pci_dev+0xff/0x1bc [drm] [ 1.357567] [<ffffffff81384e55>] ? __pm_runtime_resume+0x5b/0x6a [ 1.357569] [<ffffffff812f8bc9>] ? local_pci_probe+0x35/0x7a [ 1.357572] [<ffffffff8137c904>] ? driver_probe_device+0x1b3/0x1b3 [ 1.357574] [<ffffffff812f8e6c>] ? pci_device_probe+0xcc/0xf0 [ 1.357576] [<ffffffff8137c7e3>] ? driver_probe_device+0x92/0x1b3 [ 1.357578] [<ffffffff8137c957>] ? __driver_attach+0x53/0x73 [ 1.357580] [<ffffffff8137b0be>] ? bus_for_each_dev+0x4e/0x7f [ 1.357582] [<ffffffff8137c065>] ? bus_add_driver+0xe2/0x1c7 [ 1.357585] [<ffffffff8137ce9a>] ? driver_register+0x82/0xb5 [ 1.357587] [<ffffffffa010e000>] ? 0xffffffffa010dfff [ 1.357589] [<ffffffff81000296>] ? do_one_initcall+0x78/0xfa [ 1.357591] [<ffffffff8104e4af>] ? __blocking_notifier_call_chain+0x4f/0x5d [ 1.357594] [<ffffffff8107fe72>] ? load_module+0x1745/0x1a13 [ 1.357596] [<ffffffff8107da98>] ? mod_kobject_put+0x42/0x42 [ 1.357599] [<ffffffff81080229>] ? SyS_finit_module+0x4e/0x62 [ 1.357602] [<ffffffff817214a2>] ? system_call_fastpath+0x16/0x1b [ 1.357603] ---[ end trace e75cbd96bfbd4fea ]--- [ 1.357605] ------------[ cut here ]------------
This bug has confused several different WARNs. Lets start afresh.
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.