Bug 103533 - [BAT] igt@kms_addfb_basic@invalid-set-prop - incomplete - kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:886!
Summary: [BAT] igt@kms_addfb_basic@invalid-set-prop - incomplete - kernel BUG at drive...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 103027 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-11-01 07:10 UTC by Marta Löfstedt
Modified: 2017-11-30 07:56 UTC (History)
2 users (show)

See Also:
i915 platform: CNL
i915 features: GEM/execlists


Attachments

Description Marta Löfstedt 2017-11-01 07:10:55 UTC
CI_DRM_3302 fi-cnl-y igt@kms_addfb_basic@invalid-set-prop

kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:886!
Note, this is from: GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);

bug 102035 is from: GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);

Also, note a secondary display was added from this run to the CNL system.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3302/fi-cnl-y/igt@kms_addfb_basic@invalid-set-prop.html

<14>[  304.794369] [IGT] kms_addfb_basic: starting subtest invalid-set-prop
<14>[  304.794685] [IGT] kms_addfb_basic: exiting, ret=0
<4>[  304.795912] ------------[ cut here ]------------
<2>[  304.795920] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:886!
<4>[  304.795954] invalid opcode: 0000 [#1] PREEMPT SMP
<4>[  304.795964] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul asix snd_hwdep usbnet snd_hda_core mii ghash_clmulni_intel e1000e snd_pcm ptp pps_core prime_numbers i2c_hid
<4>[  304.796034] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G     U          4.14.0-rc7-CI-CI_DRM_3302+ #1
<4>[  304.796048] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X100.A01.1708151220 08/15/2017
<4>[  304.796067] task: ffff880267058040 task.stack: ffffc90000104000
<4>[  304.796118] RIP: 0010:intel_lrc_irq_handler+0x2b2/0x8d0 [i915]
<4>[  304.796128] RSP: 0018:ffff880271183ea0 EFLAGS: 00010297
<4>[  304.796138] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000003
<4>[  304.796150] RDX: 0000000000000005 RSI: 0000000000000001 RDI: 00000000ffffffff
<4>[  304.796161] RBP: ffff880271183f00 R08: 0000000000000000 R09: 0000000000000001
<4>[  304.796173] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
<4>[  304.796184] R13: 0000000000000008 R14: ffff88025b4442a8 R15: ffff88025c4fa040
<4>[  304.796196] FS:  0000000000000000(0000) GS:ffff880271180000(0000) knlGS:0000000000000000
<4>[  304.796210] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  304.796220] CR2: 00007efc3a770000 CR3: 0000000003e0f005 CR4: 00000000007606e0
<4>[  304.796231] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[  304.796243] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[  304.796254] PKRU: 55555554
<4>[  304.796260] Call Trace:
<4>[  304.796266]  <IRQ>
<4>[  304.796274]  ? tasklet_hi_action+0x71/0x120
<4>[  304.796285]  tasklet_hi_action+0x98/0x120
<4>[  304.796295]  __do_softirq+0xc0/0x4ae
<4>[  304.796306]  irq_exit+0xae/0xc0
<4>[  304.796313]  do_IRQ+0x71/0x130
<4>[  304.796322]  common_interrupt+0x9a/0x9a
<4>[  304.796330]  </IRQ>
<4>[  304.796337] RIP: 0010:cpuidle_enter_state+0x136/0x370
<4>[  304.796346] RSP: 0018:ffffc90000107e80 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff1d
<4>[  304.796360] RAX: ffff880267058040 RBX: 0000000000001ad9 RCX: 0000000000000001
<4>[  304.796372] RDX: 0000000000000000 RSI: ffffffff81d0eb0c RDI: ffffffff81cc26f6
<4>[  304.796383] RBP: ffffc90000107eb8 R08: 000000000000003a R09: 0000000000000006
<4>[  304.796394] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
<4>[  304.796405] R13: 0000000000000001 R14: ffff880264b4f1b8 R15: 00000046f73fec43
<4>[  304.796436]  cpuidle_enter+0x17/0x20
<4>[  304.796446]  call_cpuidle+0x23/0x40
<4>[  304.796454]  do_idle+0x192/0x1e0
<4>[  304.796463]  cpu_startup_entry+0x1d/0x20
<4>[  304.796472]  start_secondary+0x11c/0x140
<4>[  304.796482]  secondary_startup_64+0xa5/0xa5
<4>[  304.796491] Code: 6d b0 4c 89 ef e8 af fc ff ff 4c 89 ef e8 67 fc ff ff 49 8b 86 80 03 00 00 a8 02 74 7e 41 0f ba b6 80 03 00 00 01 e9 ff fd ff ff <0f> 0b 0f 0b 0f 0b 4c 89 c7 e8 e0 dc 4a e1 e9 f0 fe ff ff 41 8b 
<1>[  304.796592] RIP: intel_lrc_irq_handler+0x2b2/0x8d0 [i915] RSP: ffff880271183ea0
<4>[  304.796619] ---[ end trace 7a8f0715b10dca37 ]---
Comment 1 Chris Wilson 2017-11-01 09:50:53 UTC
We had a bug for the cnl failure... Where did that go?
Comment 2 Marta Löfstedt 2017-11-07 10:42:14 UTC
*** Bug 103027 has been marked as a duplicate of this bug. ***
Comment 3 Marta Löfstedt 2017-11-09 15:17:42 UTC
new subtest:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3326/fi-cnl-y/igt@gem_exec_flush@basic-wb-prw-default.html

<2>[  175.361223] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:886!
<4>[  175.361261] invalid opcode: 0000 [#1] PREEMPT SMP
<4>[  175.361271] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel crct10dif_pclmul crc32_pclmul snd_hda_codec asix snd_hwdep snd_hda_core ghash_clmulni_intel usbnet mii snd_pcm e1000e ptp pps_core prime_numbers i2c_hid
<4>[  175.361330] CPU: 3 PID: 1864 Comm: gem_exec_flush Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3326+ #1
<4>[  175.361341] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X100.A01.1708151220 08/15/2017
<4>[  175.361356] task: ffff8802601ec040 task.stack: ffffc90000598000
<4>[  175.361398] RIP: 0010:intel_lrc_irq_handler+0x2b4/0x8d0 [i915]
<4>[  175.361405] RSP: 0018:ffff880271183ea0 EFLAGS: 00010293
<4>[  175.361413] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000002
<4>[  175.361422] RDX: 0000000000000005 RSI: 0000000000000001 RDI: 00000000ffffffff
<4>[  175.361430] RBP: ffff880271183f00 R08: ffff8801d15ae340 R09: 0000000000000000
<4>[  175.361438] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
<4>[  175.361446] R13: 0000000000000008 R14: ffff880255912158 R15: ffff88025baf2040
<4>[  175.361455] FS:  00007f39cc90c8c0(0000) GS:ffff880271180000(0000) knlGS:0000000000000000
<4>[  175.361465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  175.361472] CR2: 00007ffd6c5c17e8 CR3: 0000000260e42004 CR4: 00000000007606e0
<4>[  175.361481] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[  175.361489] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[  175.361497] PKRU: 55555554
<4>[  175.361502] Call Trace:
<4>[  175.361506]  <IRQ>
<4>[  175.361514]  ? tasklet_hi_action+0x71/0x120
<4>[  175.361522]  tasklet_hi_action+0x98/0x120
<4>[  175.361530]  __do_softirq+0xc0/0x4ae
<4>[  175.361537]  irq_exit+0xae/0xc0
<4>[  175.361543]  do_IRQ+0x71/0x130
<4>[  175.361549]  common_interrupt+0x9a/0x9a
<4>[  175.361555]  </IRQ>
<4>[  175.361560] RIP: 0010:osq_lock+0x7c/0x110
<4>[  175.361566] RSP: 0018:ffffc9000059bbb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff1d
<4>[  175.361576] RAX: 0000000000000000 RBX: ffff88027119b840 RCX: 5a6fd2f7be4312d3
<4>[  175.361584] RDX: ffff8802601ec040 RSI: ffffffff81cb63b3 RDI: ffffffff81cc3ced
<4>[  175.361593] RBP: ffffc9000059bbd0 R08: ffff8802601ec900 R09: 0000000000000000
<4>[  175.361601] R10: 000000006be4761e R11: 000000009318ba4e R12: ffff88027111b840
<4>[  175.361610] R13: ffff88025b6600b0 R14: ffffffff81c7c692 R15: ffff8802601ec040
<4>[  175.361621]  ? osq_lock+0x2f/0x110
<4>[  175.361629]  __mutex_lock+0x665/0x9b0
<4>[  175.361635]  ? __mutex_unlock_slowpath+0x43/0x2c0
<4>[  175.361667]  ? i915_gem_pread_ioctl+0x2cc/0x7f0 [i915]
<4>[  175.361676]  ? lock_acquire+0xb0/0x200
<4>[  175.361706]  ? i915_gem_pread_ioctl+0x251/0x7f0 [i915]
<4>[  175.361736]  ? i915_gem_object_get_page+0x60/0x60 [i915]
<4>[  175.361744]  mutex_lock_interruptible_nested+0x1b/0x20
<4>[  175.361752]  ? mutex_lock_interruptible_nested+0x1b/0x20
<4>[  175.361782]  i915_gem_pread_ioctl+0x2cc/0x7f0 [i915]
<4>[  175.361789]  ? lock_acquire+0xb0/0x200
<4>[  175.361796]  ? __might_fault+0x3e/0x90
<4>[  175.361824]  ? i915_gem_object_get_page+0x60/0x60 [i915]
<4>[  175.361832]  drm_ioctl_kernel+0x69/0xb0
<4>[  175.361838]  drm_ioctl+0x2f9/0x3d0
<4>[  175.361864]  ? i915_gem_object_get_page+0x60/0x60 [i915]
<4>[  175.361874]  ? lock_acquire+0xb0/0x200
<4>[  175.361881]  ? __might_fault+0x3e/0x90
<4>[  175.361888]  do_vfs_ioctl+0x94/0x670
<4>[  175.361894]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
<4>[  175.361901]  ? __this_cpu_preempt_check+0x13/0x20
<4>[  175.361908]  ? trace_hardirqs_on_caller+0xe3/0x1b0
<4>[  175.361916]  SyS_ioctl+0x41/0x70
<4>[  175.361922]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  175.361929] RIP: 0033:0x7f39cae19587
<4>[  175.361934] RSP: 002b:00007ffd6c5c1778 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[  175.361944] RAX: ffffffffffffffda RBX: ffffffff81492003 RCX: 00007f39cae19587
<4>[  175.361953] RDX: 00007ffd6c5c17b0 RSI: 000000004020645c RDI: 0000000000000003
<4>[  175.361961] RBP: ffffc9000059bf88 R08: 0000000000000004 R09: 00007ffd6c5ec080
<4>[  175.361969] R10: 0000000000000073 R11: 0000000000000246 R12: 00000000000000d6
<4>[  175.361978] R13: 0000000000000003 R14: 000000004020645c R15: 0000000000000003
<4>[  175.361987]  ? __this_cpu_preempt_check+0x13/0x20
<4>[  175.361995] Code: 4c 89 ef e8 af fc ff ff 4c 89 ef e8 67 fc ff ff 49 8b 86 80 03 00 00 a8 02 74 7e 41 0f ba b6 80 03 00 00 01 e9 ff fd ff ff 0f 0b <0f> 0b 0f 0b 4c 89 c7 e8 e0 26 4a e1 e9 f0 fe ff ff 41 8b 46 28 
<1>[  175.362091] RIP: intel_lrc_irq_handler+0x2b4/0x8d0 [i915] RSP: ffff880271183ea0
<4>[  175.362117] ---[ end trace 72c868c4aac301ba ]---
Comment 4 Chris Wilson 2017-11-29 23:53:12 UTC
Not seen for yonks, unfortunately not seen after we enabled the GEM_TRACE to capture the execlists leading to the bug. Presuming this is related to bug 103800 which is not reproducible since roughly

commit ba74cb10c775c839f6e1d0fabd1e772eabd9c43f
Author: Michel Thierry <michel.thierry@intel.com>
Date:   Mon Nov 20 12:34:58 2017 +0000

    drm/i915/execlists: Delay writing to ELSP until HW has processed the previous write
Comment 5 Marta Löfstedt 2017-11-30 07:56:00 UTC
OK


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.