Bug 111165

Summary: [CI][BAT][guc] igt@debugfs_test@read_all_entries_display_off - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log))
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Robert M. Fosha <robert.m.fosha>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: high CC: intel-gfx-bugs, jon.ewins
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ICL i915 features: firmware/guc

Description Martin Peres 2019-07-18 09:55:37 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6502/shard-iclb5/igt@debugfs_test@read_all_entries_display_off.html

<7> [159.391930] [IGT] Reading file "i915_guc_log_relay"
<5> [159.392247] i915 0000:00:02.0: GuC log buffer overflow
<5> [159.392259] i915 0000:00:02.0: GuC log buffer overflow
<4> [159.393301] ------------[ cut here ]------------
<4> [159.393303] WARN_ON(!intel_guc_log_relay_enabled(log))
<4> [159.393353] WARNING: CPU: 4 PID: 290 at drivers/gpu/drm/i915/gt/uc/intel_guc_log.c:247 guc_log_capture_logs+0x39d/0x470 [i915]
<4> [159.393355] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp snd_hda_intel snd_hda_codec mei_hdcp crct10dif_pclmul e1000e cdc_ether crc32_pclmul snd_hwdep usbnet mii snd_hda_core ghash_clmulni_intel snd_pcm ptp
<7> [159.393363] [IGT] Reading file "i915_guc_log_level"
<4> [159.393364]  pps_core mei_me mei prime_numbers
<4> [159.393368] CPU: 4 PID: 290 Comm: kworker/4:1H Tainted: G     U            5.2.0-CI-CI_DRM_6502+ #1
<4> [159.393369] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
<4> [159.393402] Workqueue: events_highpri capture_logs_work [i915]
<4> [159.393433] RIP: 0010:guc_log_capture_logs+0x39d/0x470 [i915]
<4> [159.393436] Code: c9 3a a0 e8 95 8e 3d e1 4d 89 f3 4c 8b 14 24 48 89 da e9 3f fe ff ff 48 c7 c6 c8 55 38 a0 48 c7 c7 cc c9 3a a0 e8 93 c1 e2 e0 <0f> 0b e9 c5 fe ff ff 48 c7 c6 00 30 35 a0 48 c7 c7 c0 ee 42 a0 e8
<4> [159.393439] RSP: 0018:ffffc900003b3dc0 EFLAGS: 00010282
<7> [159.393440] [IGT] Reading file "i915_dp_test_active"
<4> [159.393442] RAX: 0000000000000000 RBX: ffff88848ac0b8e0 RCX: 0000000000000004
<4> [159.393446] RDX: 000000000000202b RSI: ffff888497b64908 RDI: ffffffff8212d681
<4> [159.393448] RBP: ffffc900003b3ec8 R08: 00000000ca1d5cb3 R09: 0000000000000000
<4> [159.393451] R10: ffffc900003b3dc0 R11: 0000000000000000 R12: ffff88849fe39b40
<4> [159.393453] R13: ffff88848ac0b8c8 R14: 0000000000000000 R15: 0000000000000000
<4> [159.393455] FS:  0000000000000000(0000) GS:ffff88849fe00000(0000) knlGS:0000000000000000
<4> [159.393457] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [159.393460] CR2: 00007f42b9b35778 CR3: 0000000005210006 CR4: 0000000000760ee0
<4> [159.393462] PKRU: 55555554
<4> [159.393463] Call Trace:
<4> [159.393469]  ? process_one_work+0x1bf/0x610
<4> [159.393474]  process_one_work+0x245/0x610
<4> [159.393479]  worker_thread+0x37/0x380
<4> [159.393482]  ? process_one_work+0x610/0x610
<4> [159.393484]  kthread+0x119/0x130
<4> [159.393487]  ? kthread_park+0x80/0x80
<4> [159.393491]  ret_from_fork+0x24/0x50
<7> [159.393495] [IGT] Reading file "i915_dp_test_type"
<4> [159.393498] irq event stamp: 1770026
<4> [159.393510] hardirqs last  enabled at (1770025): [<ffffffff8112879c>] vprintk_emit+0xcc/0x340
<4> [159.393513] hardirqs last disabled at (1770026): [<ffffffff810019f0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [159.393516] softirqs last  enabled at (1767806): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [159.393519] softirqs last disabled at (1767799): [<ffffffff810b64b9>] irq_exit+0xa9/0xc0
<4> [159.393550] WARNING: CPU: 4 PID: 290 at drivers/gpu/drm/i915/gt/uc/intel_guc_log.c:247 guc_log_capture_logs+0x39d/0x470 [i915]
<4> [159.393553] ---[ end trace 8e04a94f71e01f11 ]---
Comment 1 Chris Wilson 2019-07-18 09:57:10 UTC
https://patchwork.freedesktop.org/series/63863/
Comment 2 Martin Peres 2019-07-18 09:57:37 UTC
Assigning Michal since he landed the change enabling HuC loading by default on ICL.
Comment 3 CI Bug Log 2019-07-18 10:54:22 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@debugfs_test@read_all_entries_display_off - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log))
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6502/shard-iclb5/igt@debugfs_test@read_all_entries_display_off.html
Comment 4 CI Bug Log 2019-07-22 08:35:11 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@debugfs_test@read_all_entries_display_off - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) -}
{+ ICL: igt@debugfs_test@read_all_entries_display_off|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_326/fi-icl-u3/igt@i915_suspend@debugfs-reader.html
Comment 5 Lakshmi 2019-07-22 08:35:53 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_326/fi-icl-u3/igt@i915_suspend@debugfs-reader.html

] ------------[ cut here ]------------
<4> [376.421788] WARN_ON(!intel_guc_log_relay_enabled(log))
<4> [376.421908] WARNING: CPU: 3 PID: 339 at drivers/gpu/drm/i915/gt/uc/intel_guc_log.c:247 guc_log_capture_logs+0x39d/0x470 [i915]
<4> [376.421910] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel btusb snd_hda_codec btrtl btbcm btintel snd_hwdep snd_hda_core bluetooth e1000e snd_pcm ptp cdc_ether pps_core usbnet mii ecdh_generic ecc mei_me mei prime_numbers
<4> [376.421930] CPU: 3 PID: 339 Comm: kworker/3:1H Tainted: G     U            5.2.0-g9b39702210b8-drmtip_326+ #1
<4> [376.421932] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [376.421971] Workqueue: events_highpri capture_logs_work [i915]
<4> [376.422010] RIP: 0010:guc_log_capture_logs+0x39d/0x470 [i915]
<4> [376.422012] Code: b9 63 c0 e8 85 83 14 d4 4d 89 f3 4c 8b 14 24 48 89 da e9 3f fe ff ff 48 c7 c6 c0 45 61 c0 48 c7 c7 c4 b9 63 c0 e8 93 11 ba d3 <0f> 0b e9 c5 fe ff ff 48 c7 c6 00 20 5e c0 48 c7 c7 c0 de 6b c0 e8
<4> [376.422015] RSP: 0018:ffffa0088039bdc0 EFLAGS: 00010282
<4> [376.422017] RAX: 0000000000000000 RBX: ffff98033c94b8e0 RCX: 0000000000000003
<4> [376.422019] RDX: 00000000000021b1 RSI: ffff980353b54908 RDI: ffffffff950f8e69
<4> [376.422021] RBP: ffffa0088039bec8 R08: 00000000010b125d R09: 0000000000000000
<4> [376.422023] R10: ffffa0088039bdc0 R11: 0000000000000000 R12: ffff98035fdb9b80
<4> [376.422025] R13: ffff98033c94b8c8 R14: 0000000000000000 R15: 0000000000000000
<4> [376.422027] FS:  0000000000000000(0000) GS:ffff98035fd80000(0000) knlGS:0000000000000000
<4> [376.422030] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [376.422032] CR2: 000055db1136d000 CR3: 00000003e3210005 CR4: 0000000000760ee0
<4> [376.422034] PKRU: 55555554
<4> [376.422035] Call Trace:
<4> [376.422043]  ? process_one_work+0x1bf/0x610
<4> [376.422050]  process_one_work+0x245/0x610
<4> [376.422059]  worker_thread+0x37/0x380
<4> [376.422093]  ? process_one_work+0x610/0x610
<4> [376.422097]  kthread+0x119/0x130
<4> [376.422100]  ? kthread_park+0x80/0x80
<4> [376.422105]  ret_from_fork+0x24/0x50
<4> [376.422114] irq event stamp: 180780
<4> [376.422117] hardirqs last  enabled at (180779): [<ffffffff9412c57c>] vprintk_emit+0xcc/0x340
<4> [376.422120] hardirqs last disabled at (180780): [<ffffffff940019f0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [376.422123] softirqs last  enabled at (178422): [<ffffffff94c0033a>] __do_softirq+0x33a/0x4b9
<4> [376.422125] softirqs last disabled at (178415): [<ffffffff940ba239>] irq_exit+0xa9/0xc0
<4> [376.422192] WARNING: CPU: 3 PID: 339 at drivers/gpu/drm/i915/gt/uc/intel_guc_log.c:247 guc_log_capture_logs+0x39d/0x470 [i915]
<4> [376.422194] ---[ end trace 1500936934077ddb ]---
Comment 6 CI Bug Log 2019-07-22 11:18:29 UTC
A CI Bug Log filter associated to this bug has been updated:

{- ICL: igt@debugfs_test@read_all_entries_display_off|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) -}
{+ GUC: igt@debugfs_test@read_all_entries_display_off|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_326/fi-apl-guc/igt@i915_suspend@debugfs-reader.html
  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_328/fi-cfl-guc/igt@i915_suspend@debugfs-reader.html
Comment 7 CI Bug Log 2019-07-22 11:18:39 UTC
A CI Bug Log filter associated to this bug has been updated:

{- GUC: igt@debugfs_test@read_all_entries_display_off|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) -}
{+ GUC: igt@debugfs_test@read_all_entries_display_off|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) +}


  No new failures caught with the new filter
Comment 8 CI Bug Log 2019-07-24 14:43:00 UTC
A CI Bug Log filter associated to this bug has been updated:

{- GUC: igt@debugfs_test@read_all_entries_display_off|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) -}
{+ GUC: igt@debugfs_test@read_all_entries_display_off|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_333/fi-cfl-guc/igt@debugfs_test@read_all_entries_display_on.html
Comment 9 CI Bug Log 2019-07-24 14:43:55 UTC
A CI Bug Log filter associated to this bug has been updated:

{- GUC: igt@debugfs_test@read_all_entries_display_off|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) -}
{+ GUC: igt@debugfs_test@read_all_entries_display_(off|on)|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) +}


  No new failures caught with the new filter
Comment 10 CI Bug Log 2019-07-24 14:45:12 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* GUC: igt@runner@aborted - fail - Previous test: debugfs_test (read_all_entries_display_on)
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_333/fi-cfl-guc/igt@runner@aborted.html
Comment 11 Lakshmi 2019-08-20 13:16:44 UTC
Based on the comments from Michal, Updating the priority as the user impact is negligible impact, P2/P3 due to WARN.
Comment 12 Jon Ewins 2019-08-21 01:23:50 UTC
The relay log is a developer only feature and an extension of the standard GuC logging mechanism.  Its testing should not be part of the BAT set.  It is being implicitly enabled by existing tests, which provide no consumer for the logs.  
The issue is that IGT tests that cycle through reading the debugfs entries will call the open file op for the guc_log_relay control file and this currently causes the file to be both created and the logging to be started.  These tests have no consumer for the logs which go on to overflow, run out of sub-buffers or misbehave in suspend and reset handling.
A proposed change, that will follow as RFC patches, is to separate the creation and logging start, updating the guc_log_relay_write function, which currently just flushes the log, to also support starting the logging based on the value written.
An additional new test to actually explicitly test this relay log developer feature will follow separately.
Comment 13 Jon Ewins 2019-09-10 01:16:11 UTC
Same issue as 109536 and 111148. Proposed fix tested and passing.  Patches in prep for posting.
Comment 14 Robert M. Fosha 2019-09-10 23:07:13 UTC
RFC patch sent out for review: drm/i915/guc: Enable guc logging on guc log relay write.

While creating and testing the patch also noticed that the IGT tool intel_guc_logger needs to be updated for the new relay implementation. Will create another patch to update the IGT tool.
Comment 15 Chris Wilson 2019-09-19 20:15:55 UTC
(In reply to Robert M. Fosha from comment #14)
> RFC patch sent out for review: drm/i915/guc: Enable guc logging on guc log
> relay write.

That doesn't fix this bug; that just papers over the bug by not encountering it.
Comment 16 CI Bug Log 2019-10-08 20:48:08 UTC
A CI Bug Log filter associated to this bug has been updated:

{- GUC: igt@debugfs_test@read_all_entries_display_(off|on)|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) -}
{+ GUC: igt@debugfs_test@read_all_entries_display_(off|on)|igt@i915_suspend@debugfs-reader - dmesg-warn - WARN_ON(!intel_guc_log_relay_enabled(log)) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_385/fi-cfl-guc/igt@debugfs_test@read_all_entries.html
Comment 17 CI Bug Log 2019-10-08 20:50:41 UTC
A CI Bug Log filter associated to this bug has been updated:

{- GUC: igt@runner@aborted - fail - Previous test: debugfs_test (read_all_entries_display_on) -}
{+ GUC: igt@runner@aborted - fail - Previous test: debugfs_test (read_all_entries_display_on) +}

New failures caught by the filter:

  * https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_385/fi-cfl-guc/igt@runner@aborted.html
Comment 18 Martin Peres 2019-11-29 19:17:22 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/338.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.