Bug 100130 - [SKL][BAT] gem_exec_flush@basic-uc-pro-default incomplete in CI
Summary: [SKL][BAT] gem_exec_flush@basic-uc-pro-default incomplete in CI
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 99742 100081 100082 100083 100084 100112 100193 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-03-08 20:45 UTC by Jani Saarinen
Modified: 2017-03-17 15:32 UTC (History)
2 users (show)

See Also:
i915 platform: SKL
i915 features:


Attachments

Comment 1 Jani Saarinen 2017-03-09 06:52:59 UTC
Now also other test incomplete.
Documenting: https://intel-gfx-ci.01.org/CI/CI_DRM_2307/fi-skl-6770hq/igt@gem_exec_flush@basic-batch-kernel-default-uc.html
Comment 2 Jani Saarinen 2017-03-09 06:56:30 UTC
Last change seen after these changes:
79e440c drm-tip: 2017y-03m-08d-20h-49m-20s UTC integration manifest
5b5554c drm/i915: Check for an invalid seqno before __i915_gem_request_started
f166244 drm/i915: Purge i915_gem_object_is_dead()
03d1cac drm/i915: Avoiding recursing on ww_mutex inside shrinker
6f85859 drm-tip: 2017y-03m-08d-14h-47m-28s UTC integration manifest

But might not be related to https://patchwork.freedesktop.org/series/20911/
as Chris saying:
"More random unrelated fails, thanks for the report & review, pushed."
Comment 5 Chris Wilson 2017-03-09 10:38:10 UTC
Can we get a trimmed list (no suspend or hibernate) and run it in a loop on a skl (seems to be most susceptible) and see if we can get anything out of netconsole? Or just be able to manually collect information when it freezes?
Comment 6 Marta Löfstedt 2017-03-09 12:29:06 UTC
Chris, it was decided on the CI meeting that I should categorize all bugs on cibuglog.
I want to pin-point all bugs where the run didn't terminate as expected this would then be input to a task-force to get to the bottom of this problem.
Comment 8 Marta Löfstedt 2017-03-09 13:10:16 UTC
Theese are the only bugs for incomplete I have found so far, that has hudson timeout in igt.log and:

[   53.111596] [ INFO: possible circular locking dependency detected ]
[   53.111628] 4.11.0-rc1-CI-CI_DRM_2310+ #1 Not tainted

in dmesg_before.txt
Comment 9 Marta Löfstedt 2017-03-09 13:17:21 UTC
Looks like deadlock is for pstore.

We almost caught the ghost, but then pstore messed it up.
Comment 10 Marta Löfstedt 2017-03-09 14:03:45 UTC
I mailed Tony Luck about the deadloack I believe he is pstore maintainer.
Comment 12 Jani Saarinen 2017-03-13 08:14:22 UTC
Do we consider this to same bucket even run in patchwork?
https://patchwork.freedesktop.org/series/21020/
Comment 13 Martin Peres 2017-03-13 10:46:20 UTC
Chris Wilson thinks that this commit in igt may have fixed the issue (https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/commit/?id=9759df989f18697a817d5de27021bae09bcf344e).

Every run in between CI_DRM_2306 and CI_DRM_2315 were showing the issue, but nothing for the past 10 runs.

We'll keep an eye on this for a little longer before closing the bug.
Comment 14 Tomeu Vizoso 2017-03-14 14:24:14 UTC
(In reply to Martin Peres from comment #13)
> Chris Wilson thinks that this commit in igt may have fixed the issue
> (https://cgit.freedesktop.org/xorg/app/intel-gpu-tools/commit/
> ?id=9759df989f18697a817d5de27021bae09bcf344e).
> 
> Every run in between CI_DRM_2306 and CI_DRM_2315 were showing the issue, but
> nothing for the past 10 runs.
> 
> We'll keep an eye on this for a little longer before closing the bug.

From #intel-gfx:

ickle: tomeu: considering they still occur, my optimism that the signal fix was all that was required was wrong
Comment 15 Tomi Sarvela 2017-03-14 14:56:26 UTC
Reproduced using another 6700K/Z170 with CI_DRM_2333

i915_gem_request.h:203 is

        GEM_BUG_ON(fence && !dma_fence_is_i915(fence));

inside
static inline struct drm_i915_gem_request * to_request(struct dma_fence *fence) {}

---

[  794.038599] [IGT] gem_exec_flush: starting subtest basic-uc-pro-default
[  796.056398] ------------[ cut here ]------------
[  796.061108] kernel BUG at drivers/gpu/drm/i915/i915_gem_request.h:203!
[  796.067755] invalid opcode: 0000 [#1] PREEMPT SMP
[  796.072537] Modules linked in: snd_hda_intel i915 vgem x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek]
[  796.102872] CPU: 6 PID: 19066 Comm: gem_exec_flush Tainted: G     U          4.11.0-rc1-CI-CI_DRM_2333+ #1
[  796.112715] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2017
[  796.121998] task: ffff88042902cf40 task.stack: ffffc9000071c000
[  796.128037] RIP: 0010:notify_ring+0x219/0x220 [i915]
[  796.133108] RSP: 0018:ffff88043ed83c28 EFLAGS: 00010007
[  796.138431] RAX: 0000000000000001 RBX: ffff8803a1b22158 RCX: 0000000081edfc31
[  796.145702] RDX: 0000000081edfc30 RSI: 0000000000000000 RDI: ffff8804235aea20
[  796.152981] RBP: ffff88043ed83c48 R08: 0000000000000001 R09: 0000000000000001
[  796.160261] R10: 0000000000000000 R11: ffff88042902cf40 R12: ffff8804235aea20
[  796.167533] R13: ffffc9001143bbf8 R14: ffff8803a1b221a8 R15: ffff8804212e0000
[  796.174795] FS:  00007ff8623148c0(0000) GS:ffff88043ed80000(0000) knlGS:0000000000000000
[  796.183037] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  796.188896] CR2: 00007ffd5d3d6940 CR3: 000000037635a000 CR4: 00000000003406e0
[  796.196150] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  796.203422] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  796.210693] Call Trace:
[  796.213179]  <IRQ>
[  796.215245]  gen8_gt_irq_handler+0x219/0x290 [i915]
[  796.220236]  gen8_irq_handler+0x8e/0x6b0 [i915]
[  796.224855]  __handle_irq_event_percpu+0x58/0x370
[  796.229647]  handle_irq_event_percpu+0x1e/0x50
[  796.234181]  handle_irq_event+0x34/0x60
[  796.238089]  handle_edge_irq+0xbe/0x150
[  796.242008]  handle_irq+0x15/0x20
[  796.245377]  do_IRQ+0x63/0x130
[  796.248482]  common_interrupt+0x90/0x90
[  796.252390] RIP: 0010:_raw_spin_unlock_irqrestore+0x54/0x60
[  796.258056] RSP: 0018:ffff88043ed83ea0 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff18
[  796.265770] RAX: 0000000000000006 RBX: 0000000000000292 RCX: 0000000000000000
[  796.273024] RDX: ffffffffa008db2c RSI: 0000000000000001 RDI: ffffffff8187a552
[  796.280294] RBP: ffff88043ed83eb0 R08: 0000000000000005 R09: 0000000000000000
[  796.287574] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8804212e79d0
[  796.294836] R13: ffff8803a1b22450 R14: ffff8803a1b22158 R15: 0000000000000000
[  796.302070]  ? intel_lrc_irq_handler+0x45c/0x490 [i915]
[  796.307395]  ? _raw_spin_unlock_irqrestore+0x52/0x60
[  796.312481]  intel_lrc_irq_handler+0x45c/0x490 [i915]
[  796.317649]  tasklet_hi_action+0xf0/0x110
[  796.321738]  __do_softirq+0x116/0x4c0
[  796.325475]  irq_exit+0xa9/0xc0
[  796.328672]  do_IRQ+0x6c/0x130
[  796.331788]  ? i915_gem_pread_ioctl+0x234/0x7f0 [i915]
[  796.337018]  common_interrupt+0x90/0x90
[  796.340938] RIP: 0010:osq_lock+0x77/0x110
[  796.345034] RSP: 0018:ffffc9000071fbf0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff18
[  796.352731] RAX: 0000000000000000 RBX: ffff88043ed9ab40 RCX: 0000000000000002
[  796.359976] RDX: ffff88042902cf40 RSI: ffffffff81c6eedd RDI: ffffffff81c7ce87
[  796.367255] RBP: ffffc9000071fc08 R08: 0000000000000000 R09: 0000000000000000
[  796.374553] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88043ecdab40
[  796.381860] R13: ffff8804212e00b0 R14: ffffffffa00732b4 R15: ffff8804212e0070
[  796.389155]  </IRQ>
[  796.391308]  ? i915_gem_pread_ioctl+0x234/0x7f0 [i915]
[  796.396491]  __mutex_lock+0x649/0x990
[  796.400191]  ? __mutex_lock+0xb0/0x990
[  796.404022]  ? i915_gem_pread_ioctl+0x234/0x7f0 [i915]
[  796.409282]  ? i915_gem_pread_ioctl+0x1b0/0x7f0 [i915]
[  796.414533]  mutex_lock_interruptible_nested+0x16/0x20
[  796.419804]  i915_gem_pread_ioctl+0x234/0x7f0 [i915]
[  796.424795]  ? i915_gem_pread_ioctl+0x1b0/0x7f0 [i915]
[  796.430056]  ? __might_fault+0x87/0x90
[  796.433904]  ? __might_fault+0x3e/0x90
[  796.437701]  drm_ioctl+0x200/0x450
[  796.441140]  ? i915_gem_object_get_page+0x60/0x60 [i915]
[  796.446522]  ? retint_kernel+0x2d/0x2d
[  796.450335]  do_vfs_ioctl+0x90/0x6e0
[  796.453948]  SyS_ioctl+0x3c/0x70
[  796.457233]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  796.461940] RIP: 0033:0x7ff860d3d357
[  796.465561] RSP: 002b:00007ffd5d2df588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  796.473233] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff860d3d357
[  796.480469] RDX: 00007ffd5d2df5c0 RSI: 000000004020645c RDI: 0000000000000003
[  796.487670] RBP: 00000000000003ee R08: 0000000000000004 R09: 0000000000000000
[  796.494924] R10: 000000000000003a R11: 0000000000000246 R12: 00007ff8623d9fb8
[  796.502187] R13: 0000000000000001 R14: 0000000000000fb8 R15: 0000000000000108
[  796.509408] Code: c0 0f 85 08 ff ff ff 48 c7 c2 70 13 15 a0 be ee 02 00 00 48 c7 c7 a0 13 15 a0 c6 05 6f 61 15 00 01 e8 cc 47 0b e1 e9 e4 fe ff ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 54 53 4 
[  796.528463] RIP: notify_ring+0x219/0x220 [i915] RSP: ffff88043ed83c28
[  796.535042] ---[ end trace dcc74bec3ebb6986 ]---
[  798.943346] Kernel panic - not syncing: Fatal exception in interrupt
[  800.026540] Shutting down cpus with NMI
[  800.030440] Kernel Offset: disabled
[  800.182376] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[  800.189634] ------------[ cut here ]------------
[  800.194346] WARNING: CPU: 6 PID: 19066 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x3a/0x40
[  800.203972] Modules linked in: snd_hda_intel i915 vgem x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek]
[  800.234255] CPU: 6 PID: 19066 Comm: gem_exec_flush Tainted: G     UD         4.11.0-rc1-CI-CI_DRM_2333+ #1
[  800.244064] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2017
[  800.253356] Call Trace:
[  800.255842]  <IRQ>
[  800.257905]  dump_stack+0x67/0x92
[  800.261276]  __warn+0xc6/0xe0
[  800.264300]  warn_slowpath_null+0x18/0x20
[  800.268367]  native_smp_send_reschedule+0x3a/0x40
[  800.273160]  trigger_load_balance+0x2cd/0x580
[  800.277604]  ? trigger_load_balance+0x6f/0x580
[  800.282094]  scheduler_tick+0x97/0xc0
[  800.285848]  ? tick_sched_handle.isra.7+0x30/0x30
[  800.290629]  update_process_times+0x42/0x50
[  800.294876]  tick_sched_handle.isra.7+0x29/0x30
[  800.299479]  tick_sched_timer+0x3d/0x70
[  800.303378]  __hrtimer_run_queues+0xf3/0x530
[  800.307739]  hrtimer_interrupt+0xb9/0x210
[  800.311828]  local_apic_timer_interrupt+0x31/0x50
[  800.316614]  smp_apic_timer_interrupt+0x33/0x50
[  800.321225]  apic_timer_interrupt+0x90/0xa0
[  800.325489] RIP: 0010:panic+0x1c2/0x1fb
[  800.329404] RSP: 0018:ffff88043ed83990 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[  800.337127] RAX: 0000000000000041 RBX: 0000000000000000 RCX: 0000000000000000
[  800.344381] RDX: 0000000000010104 RSI: ffffffff81c6eedd RDI: ffffffff8118352e
[  800.351627] RBP: ffff88043ed83a00 R08: 0000000000000001 R09: 0000000000000000
[  800.358899] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  800.366177] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  800.373415]  ? panic+0x1bf/0x1fb
[  800.376701]  ? kmsg_dump+0x11f/0x1c0
[  800.380340]  oops_end+0x78/0x90
[  800.383529]  die+0x46/0x60
[  800.386302]  do_trap+0xae/0x140
[  800.389508]  do_error_trap+0x88/0x120
[  800.393246]  ? notify_ring+0x219/0x220 [i915]
[  800.397690]  ? enqueue_task_fair+0xb6/0xe90
[  800.401946]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  800.406729]  do_invalid_op+0x1b/0x20
[  800.410379]  invalid_op+0x18/0x20
[  800.413758] RIP: 0010:notify_ring+0x219/0x220 [i915]
[  800.418811] RSP: 0018:ffff88043ed83c28 EFLAGS: 00010007
[  800.424132] RAX: 0000000000000001 RBX: ffff8803a1b22158 RCX: 0000000081edfc31
[  800.431393] RDX: 0000000081edfc30 RSI: 0000000000000000 RDI: ffff8804235aea20
[  800.438673] RBP: ffff88043ed83c48 R08: 0000000000000001 R09: 0000000000000001
[  800.445938] R10: 0000000000000000 R11: ffff88042902cf40 R12: ffff8804235aea20
[  800.453191] R13: ffffc9001143bbf8 R14: ffff8803a1b221a8 R15: ffff8804212e0000
[  800.460463]  ? notify_ring+0x5f/0x220 [i915]
[  800.464837]  gen8_gt_irq_handler+0x219/0x290 [i915]
[  800.469813]  gen8_irq_handler+0x8e/0x6b0 [i915]
[  800.474424]  __handle_irq_event_percpu+0x58/0x370
[  800.479209]  handle_irq_event_percpu+0x1e/0x50
[  800.483742]  handle_irq_event+0x34/0x60
[  800.487641]  handle_edge_irq+0xbe/0x150
[  800.491541]  handle_irq+0x15/0x20
[  800.494914]  do_IRQ+0x63/0x130
[  800.498025]  common_interrupt+0x90/0x90
[  800.501941] RIP: 0010:_raw_spin_unlock_irqrestore+0x54/0x60
[  800.507609] RSP: 0018:ffff88043ed83ea0 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff18
[  800.515330] RAX: 0000000000000006 RBX: 0000000000000292 RCX: 0000000000000000
[  800.522584] RDX: ffffffffa008db2c RSI: 0000000000000001 RDI: ffffffff8187a552
[  800.529830] RBP: ffff88043ed83eb0 R08: 0000000000000005 R09: 0000000000000000
[  800.537083] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8804212e79d0
[  800.544354] R13: ffff8803a1b22450 R14: ffff8803a1b22158 R15: 0000000000000000
[  800.551641]  ? intel_lrc_irq_handler+0x45c/0x490 [i915]
[  800.556965]  ? _raw_spin_unlock_irqrestore+0x52/0x60
[  800.562047]  intel_lrc_irq_handler+0x45c/0x490 [i915]
[  800.567184]  tasklet_hi_action+0xf0/0x110
[  800.571265]  __do_softirq+0x116/0x4c0
[  800.575011]  irq_exit+0xa9/0xc0
[  800.578216]  do_IRQ+0x6c/0x130
[  800.581338]  ? i915_gem_pread_ioctl+0x234/0x7f0 [i915]
[  800.586563]  common_interrupt+0x90/0x90
[  800.590497] RIP: 0010:osq_lock+0x77/0x110
[  800.594571] RSP: 0018:ffffc9000071fbf0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff18
[  800.602257] RAX: 0000000000000000 RBX: ffff88043ed9ab40 RCX: 0000000000000002
[  800.609538] RDX: ffff88042902cf40 RSI: ffffffff81c6eedd RDI: ffffffff81c7ce87
[  800.616809] RBP: ffffc9000071fc08 R08: 0000000000000000 R09: 0000000000000000
[  800.624063] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88043ecdab40
[  800.631299] R13: ffff8804212e00b0 R14: ffffffffa00732b4 R15: ffff8804212e0070
[  800.638553]  </IRQ>
[  800.640706]  ? i915_gem_pread_ioctl+0x234/0x7f0 [i915]
[  800.645922]  __mutex_lock+0x649/0x990
[  800.649640]  ? __mutex_lock+0xb0/0x990
[  800.653471]  ? i915_gem_pread_ioctl+0x234/0x7f0 [i915]
[  800.658696]  ? i915_gem_pread_ioctl+0x1b0/0x7f0 [i915]
[  800.663931]  mutex_lock_interruptible_nested+0x16/0x20
[  800.669174]  i915_gem_pread_ioctl+0x234/0x7f0 [i915]
[  800.674236]  ? i915_gem_pread_ioctl+0x1b0/0x7f0 [i915]
[  800.679469]  ? __might_fault+0x87/0x90
[  800.683275]  ? __might_fault+0x3e/0x90
[  800.687088]  drm_ioctl+0x200/0x450
[  800.690580]  ? i915_gem_object_get_page+0x60/0x60 [i915]
[  800.695981]  ? retint_kernel+0x2d/0x2d
[  800.699793]  do_vfs_ioctl+0x90/0x6e0
[  800.703423]  SyS_ioctl+0x3c/0x70
[  800.706718]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  800.711415] RIP: 0033:0x7ff860d3d357
[  800.715063] RSP: 002b:00007ffd5d2df588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  800.722768] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff860d3d357
[  800.730047] RDX: 00007ffd5d2df5c0 RSI: 000000004020645c RDI: 0000000000000003
[  800.737311] RBP: 00000000000003ee R08: 0000000000000004 R09: 0000000000000000
[  800.744548] R10: 000000000000003a R11: 0000000000000246 R12: 00007ff8623d9fb8
[  800.751819] R13: 0000000000000001 R14: 0000000000000fb8 R15: 0000000000000108
[  800.759081] ---[ end trace dcc74bec3ebb6987 ]---
Comment 16 Chris Wilson 2017-03-16 14:28:59 UTC
*** Bug 100193 has been marked as a duplicate of this bug. ***
Comment 17 Chris Wilson 2017-03-16 14:29:04 UTC
*** Bug 100081 has been marked as a duplicate of this bug. ***
Comment 18 Chris Wilson 2017-03-16 14:29:09 UTC
*** Bug 100112 has been marked as a duplicate of this bug. ***
Comment 19 Chris Wilson 2017-03-16 14:29:13 UTC
*** Bug 100084 has been marked as a duplicate of this bug. ***
Comment 20 Chris Wilson 2017-03-16 14:29:19 UTC
*** Bug 100083 has been marked as a duplicate of this bug. ***
Comment 21 Chris Wilson 2017-03-16 14:29:24 UTC
*** Bug 100082 has been marked as a duplicate of this bug. ***
Comment 22 Chris Wilson 2017-03-16 14:41:05 UTC
*** Bug 99726 has been marked as a duplicate of this bug. ***
Comment 23 Chris Wilson 2017-03-16 15:20:37 UTC
I hoped

commit 429732e860fda07fc1bb96fe23c43146c27e08e0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Mar 15 21:07:23 2017 +0000

    drm/i915/breadcrumbs: Update bottom-half before marking as complete

would take care of the trace Tomi reported. Still silent incompletes on farm1 - let's hope they are getting rarer.
Comment 24 Chris Wilson 2017-03-16 23:29:12 UTC
*** Bug 99742 has been marked as a duplicate of this bug. ***
Comment 25 Chris Wilson 2017-03-17 09:12:32 UTC
Now proclaiming fixed, see comment 23. Tomi found a separate issue with the same symptoms, i.e nothing recorded by CI, (bug 100232) that seems to have accounted for the last of them.
Comment 26 Chris Wilson 2017-03-17 15:32:14 UTC
*** Bug 100254 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.