Bug 109241

Summary: [CI][BAT] all suspend tests - dmesg-warn - called from state HALTED
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ALL i915 features:

Description Martin Peres 2019-01-07 15:39:31 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5366/fi-snb-2600/igt%40gem_exec_suspend%40basic-s3.html

<4> [74.170194] ------------[ cut here ]------------
<4> [74.170220] called from state HALTED
<4> [74.170237] WARNING: CPU: 3 PID: 2473 at drivers/net/phy/phy.c:548 phy_start_aneg+0xf1/0x140
<4> [74.170239] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic asix usbnet mii i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core broadcom bcm_phy_lib snd_pcm tg3 mei_me i2c_i801 mei prime_numbers lpc_ich
<4> [74.170258] CPU: 3 PID: 2473 Comm: kworker/u16:22 Tainted: G     U            5.0.0-rc1-CI-CI_DRM_5366+ #1
<4> [74.170260] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
<4> [74.170264] Workqueue: events_unbound async_run_entry_fn
<4> [74.170269] RIP: 0010:phy_start_aneg+0xf1/0x140
<4> [74.170271] Code: 10 89 93 d0 04 00 00 0f b6 40 04 89 83 d4 04 00 00 e9 74 ff ff ff 48 8b 34 c5 20 a7 ed 81 48 c7 c7 a6 7c 11 82 e8 9f 94 9d ff <0f> 0b 41 bc f0 ff ff ff eb 91 80 a3 c0 04 00 00 7f eb a3 48 b8 ff
<4> [74.170273] RSP: 0018:ffffc9000044bc80 EFLAGS: 00010282
<4> [74.170276] RAX: 0000000000000000 RBX: ffff888216e38958 RCX: 0000000000000000
<4> [74.170278] RDX: 0000000000000000 RSI: ffffffff8213044a RDI: 00000000ffffffff
<4> [74.170280] RBP: ffff888216e38f00 R08: 00000000c14614ba R09: 0000000000000000
<4> [74.170282] R10: ffffc9000044bc80 R11: 0000000000000000 R12: ffff888216e38958
<4> [74.170284] R13: 0000000000000000 R14: ffff888223655f28 R15: ffff88821af6da58
<4> [74.170287] FS:  0000000000000000(0000) GS:ffff888227ac0000(0000) knlGS:0000000000000000
<4> [74.170289] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [74.170291] CR2: 000055f1dae80d80 CR3: 0000000002214006 CR4: 00000000000606e0
<4> [74.170293] Call Trace:
<4> [74.170301]  tg3_power_down_prepare+0x754/0xa30 [tg3]
<4> [74.170308]  tg3_suspend+0x1e5/0x350 [tg3]
<4> [74.170316]  pci_pm_suspend+0x6d/0x120
<4> [74.170319]  ? pci_pm_freeze+0xc0/0xc0
<4> [74.170324]  dpm_run_callback+0x64/0x280
<4> [74.170330]  __device_suspend+0x12a/0x5b0
<4> [74.170335]  ? dpm_watchdog_set+0x60/0x60
<4> [74.170344]  async_suspend+0x15/0x90
<4> [74.170347]  async_run_entry_fn+0x34/0x160
<4> [74.170352]  process_one_work+0x245/0x610
<4> [74.170360]  worker_thread+0x37/0x380
<4> [74.170365]  ? process_one_work+0x610/0x610
<4> [74.170368]  kthread+0x119/0x130
<4> [74.170371]  ? kthread_park+0x80/0x80
<4> [74.170377]  ret_from_fork+0x3a/0x50
<4> [74.170388] irq event stamp: 648
<4> [74.170392] hardirqs last  enabled at (647): [<ffffffff81123be2>] vprintk_emit+0x302/0x320
<4> [74.170396] hardirqs last disabled at (648): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [74.170399] softirqs last  enabled at (626): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [74.170402] softirqs last disabled at (605): [<ffffffff81a00e1a>] do_softirq_own_stack+0x2a/0x40
<4> [74.170405] WARNING: CPU: 3 PID: 2473 at drivers/net/phy/phy.c:548 phy_start_aneg+0xf1/0x140
<4> [74.170407] ---[ end trace e382359f2ec4f600 ]---
Comment 1 CI Bug Log 2019-01-07 15:40:03 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SNB: all suspend tests - dmesg-warn - called from state HALTED
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5366/shard-snb2/igt@i915_suspend@shrink.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5367/shard-snb7/igt@i915_suspend@shrink.html
Comment 2 Chris Wilson 2019-01-07 16:55:29 UTC
A netdev bug that isn't e1000e!
Comment 3 Chris Wilson 2019-01-07 23:13:02 UTC
Fwiw,

commit 2b3e88ea65287ba738a798622405b15344871085
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Sun Dec 16 18:30:14 2018 +0100

    net: phy: improve phy state checking
    
    Add helpers phy_is_started() and __phy_is_started() to avoid open-coded
    checks whether PHY has been started. To make the check easier move
    PHY_HALTED before PHY_UP in enum phy_state. Further improvements:
    
    phy_start_aneg():
    Return -EBUSY and print warning if function is called from a non-started
    state (DOWN, READY, HALTED). Better check because function is exported
    and drivers may use it incorrectly.
    
    phy_interrupt():
    Return IRQ_NONE also if state is DOWN or READY. We should never receive
    an interrupt in one of these states, but better play safe.
    
    phy_stop():
    Just return and print a warning if PHY is in a non-started state.
    This warning should help to identify drivers with unbalanced calls to
    phy_start() / phy_stop().
    
    phy_state_machine():
    Schedule state machine run only if PHY is in a started state.
    E.g. if state is READY we don't need the state machine, it will be
    started by phy_start().
    
    v2:
    - don't use __func__ within phy_warn_state
    v3:
    - use WARN() instead of printing error message to facilitate debugging
    
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

deemed to turn phy_start_aneg() while HALTED into an error (previously it still did the phy_config_aneg()).
Comment 4 Chris Wilson 2019-01-09 13:04:30 UTC
Local (topic/core-for-CI) revert for snb coverage:

commit 9d7c979969612aa4b57003ac853cfafcff01ccd5 (drm-intel/topic/core-for-CI, topic/core-for-CI)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jan 9 10:47:30 2019 +0000

    Revert "net: phy: improve phy state checking"
    
    This reverts commit 2b3e88ea65287ba738a798622405b15344871085.
Comment 5 Chris Wilson 2019-01-09 21:30:52 UTC
commit 3a73af8b1a2c76803911ea12970cd734b7f481ec (drm-intel/topic/core-for-CI, topic/core-for-CI)
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Wed Jan 9 20:34:56 2019 +0100

    net: phy: fix too strict check in phy_start_aneg
    
    When adding checks to detect wrong usage of the phylib API we added
    a check to phy_start_aneg() which is too strict. If the phylib
    state machine is in state PHY_HALTED we should allow reconfiguring
    and restarting aneg, and just don't touch the state.
    
    Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
    Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Comment 6 Martin Peres 2019-01-16 11:24:02 UTC
(In reply to Chris Wilson from comment #5)
> commit 3a73af8b1a2c76803911ea12970cd734b7f481ec
> (drm-intel/topic/core-for-CI, topic/core-for-CI)
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Wed Jan 9 20:34:56 2019 +0100
> 
>     net: phy: fix too strict check in phy_start_aneg
>     
>     When adding checks to detect wrong usage of the phylib API we added
>     a check to phy_start_aneg() which is too strict. If the phylib
>     state machine is in state PHY_HALTED we should allow reconfiguring
>     and restarting aneg, and just don't touch the state.
>     
>     Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
>     Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.