Bug 106702 - [CI][DRMTIP] igt@gem_eio@in-flight-suspend- incomplete - GEM_BUG_ON(buf[2 * head + 1] != port->context_id)
Summary: [CI][DRMTIP] igt@gem_eio@in-flight-suspend- incomplete - GEM_BUG_ON(buf[2 * h...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Joonas Lahtinen
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-29 08:22 UTC by Martin Peres
Modified: 2019-03-06 18:19 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-05-29 08:22:53 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_51/fi-kbl-r/igt@gem_eio@in-flight-suspend.html

process_csb:1074 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)
<0>[   75.890016] ---------------------------------
<4>[   75.890017] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb asix btrtl btbcm usbnet mii btintel snd_hda_intel bluetooth snd_hda_codec snd_hwdep x86_pkg_temp_thermal intel_powerclamp snd_hda_core coretemp crct10dif_pclmul crc32_pclmul ecdh_generic ghash_clmulni_intel e1000e snd_pcm mei_me mei prime_numbers pinctrl_sunrisepoint pinctrl_intel
<4>[   75.890035] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G     U            4.17.0-rc6-g5195e857106a-drmtip_51+ #1
<4>[   75.890036] Hardware name: Intel Corporation Kabylake Client platform/Kabylake R DDR4 RVP, BIOS KBLSE2R1.R00.X078.P02.1703030515 03/03/2017
<4>[   75.890056] RIP: 0010:process_csb+0x638/0x8d0 [i915]
<4>[   75.890057] RSP: 0018:ffffa2c4bec83e20 EFLAGS: 00010282
<4>[   75.890058] RAX: 000000000000000e RBX: 0000000000000002 RCX: 0000000000000000
<4>[   75.890059] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffa2c4b5998ff8
<4>[   75.890060] RBP: ffffa2c4bec83e90 R08: 000000000000023a R09: ffffa2c4b5364000
<4>[   75.890060] R10: 0000000000000001 R11: ffffa2c4b5998ff8 R12: ffffa2c49770e054
<4>[   75.890061] R13: ffffa2c3ff008040 R14: ffffa2c49770e040 R15: ffffa2c4912cc2a8
<4>[   75.890062] FS:  0000000000000000(0000) GS:ffffa2c4bec80000(0000) knlGS:0000000000000000
<4>[   75.890063] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   75.890064] CR2: 000055cd7986f5b8 CR3: 00000002a8210005 CR4: 00000000003606e0
<4>[   75.890065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[   75.890065] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[   75.890066] Call Trace:
<4>[   75.890067]  <IRQ>
<4>[   75.890089]  execlists_submission_tasklet+0xb1/0xe80 [i915]
<4>[   75.890092]  ? lock_acquire+0xa6/0x210
<4>[   75.890094]  ? handle_irq_event+0x3a/0x50
<4>[   75.890097]  tasklet_action_common.isra.5+0x47/0xb0
<4>[   75.890099]  __do_softirq+0xc1/0x4e1
<4>[   75.890101]  ? _raw_spin_unlock+0x29/0x40
<4>[   75.890103]  irq_exit+0xa4/0xb0
<4>[   75.890104]  do_IRQ+0x9a/0x120
<4>[   75.890106]  common_interrupt+0xf/0xf
<4>[   75.890107]  </IRQ>
<4>[   75.890110] RIP: 0010:cpuidle_enter_state+0xac/0x360
<4>[   75.890110] RSP: 0018:ffffa4e6000ffe90 EFLAGS: 00000216 ORIG_RAX: ffffffffffffffda
<4>[   75.890112] RAX: ffffa2c4b52ca800 RBX: 000000000000078d RCX: 0000000000000000
<4>[   75.890113] RDX: 0000000000000046 RSI: ffffffffa10fb831 RDI: ffffffffa10a89bf
<4>[   75.890113] RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000000
<4>[   75.890114] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa12963d8
<4>[   75.890115] R13: ffffc4e5ffc84eb0 R14: 0000000000000000 R15: 00000011aab28c9a
<4>[   75.890119]  do_idle+0x1f3/0x250
<4>[   75.890121]  cpu_startup_entry+0x6a/0x70
<4>[   75.890124]  start_secondary+0x198/0x1e0
<4>[   75.890126]  secondary_startup_64+0xa5/0xb0
<4>[   75.890129] Code: e8 f7 cb be df 48 8b 35 4f 57 19 00 49 c7 c0 50 00 61 c0 b9 32 04 00 00 48 c7 c2 50 8e 5d c0 48 c7 c7 43 08 51 c0 e8 28 36 c5 df <0f> 0b 48 c7 c1 8a 26 5f c0 ba 34 04 00 00 48 c7 c6 50 8e 5d c0 
<1>[   75.890178] RIP: process_csb+0x638/0x8d0 [i915] RSP: ffffa2c4bec83e20
<4>[   75.890182] ---[ end trace 4602640e33b92121 ]---
Comment 1 Martin Peres 2018-05-29 08:23:25 UTC
This looks like a regression introduced in drmtip_51, so bumping the priority as Linux 4.17 is about to be released.
Comment 2 Chris Wilson 2018-05-29 09:45:39 UTC
(In reply to Martin Peres from comment #1)
> This looks like a regression introduced in drmtip_51, so bumping the
> priority as Linux 4.17 is about to be released.

tip is targetting 4.18. Iiuc, the trace can only be generated by gem_eio as it requires both simulating a suspend with TEST_DEVICES and disabling the GPU reset. It should be fixed by https://patchwork.freedesktop.org/patch/225442/ and if my reckoning is correct, we could have hit this since

commit ac697ae8013a7c7301174c9c3b02a92fe418b7ea
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Mar 15 15:10:15 2018 +0000

    drm/i915: Stop engines when declaring the machine wedged
Comment 3 Martin Peres 2018-05-29 11:55:05 UTC
(In reply to Chris Wilson from comment #2)
> (In reply to Martin Peres from comment #1)
> > This looks like a regression introduced in drmtip_51, so bumping the
> > priority as Linux 4.17 is about to be released.
> 
> tip is targetting 4.18. 

Sure, but the problem may be found in linus' tip and introduced here as a backmerge. If you can tell me that this is not the case, then we can lower the priority.

> Iiuc, the trace can only be generated by gem_eio as
> it requires both simulating a suspend with TEST_DEVICES and disabling the
> GPU reset. It should be fixed by
> https://patchwork.freedesktop.org/patch/225442/ and if my reckoning is
> correct, we could have hit this since
> 
> commit ac697ae8013a7c7301174c9c3b02a92fe418b7ea
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Mar 15 15:10:15 2018 +0000
> 
>     drm/i915: Stop engines when declaring the machine wedged

Thanks! Let's see :)
Comment 4 Chris Wilson 2018-05-31 18:37:35 UTC
I applied commit c3160da9a6af0e2d8f4fb3410df9d027a178ca3d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 31 09:22:45 2018 +0100

    drm/i915: After reset on sanitization, reset the engine backends
    
    As we reset the GPU on suspend/resume, we also do need to reset the
    engine state tracking so call into the engine backends. This is
    especially important so that we can also sanitize the state tracking
    across resume.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-3-chris@chris-wilson.co.uk

which I claim to be sufficient to prevent this BUG().
Comment 5 Martin Peres 2018-06-14 14:39:16 UTC
p(In reply to Chris Wilson from comment #4)
> I applied commit c3160da9a6af0e2d8f4fb3410df9d027a178ca3d
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu May 31 09:22:45 2018 +0100
> 
>     drm/i915: After reset on sanitization, reset the engine backends
>     
>     As we reset the GPU on suspend/resume, we also do need to reset the
>     engine state tracking so call into the engine backends. This is
>     especially important so that we can also sanitize the state tracking
>     across resume.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-3-
> chris@chris-wilson.co.uk
> 
> which I claim to be sufficient to prevent this BUG().

Your claim did not hold up to reality as it is still happening at every single run... try again?
Comment 6 Chris Wilson 2018-06-15 11:30:45 UTC
(In reply to Martin Peres from comment #5)
> p(In reply to Chris Wilson from comment #4)
> > I applied commit c3160da9a6af0e2d8f4fb3410df9d027a178ca3d
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Thu May 31 09:22:45 2018 +0100
> > 
> >     drm/i915: After reset on sanitization, reset the engine backends
> >     
> >     As we reset the GPU on suspend/resume, we also do need to reset the
> >     engine state tracking so call into the engine backends. This is
> >     especially important so that we can also sanitize the state tracking
> >     across resume.
> >     
> >     References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
> >     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >     Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >     Link:
> > https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-3-
> > chris@chris-wilson.co.uk
> > 
> > which I claim to be sufficient to prevent this BUG().
> 
> Your claim did not hold up to reality as it is still happening at every
> single run... try again?

What are you talking about?
Comment 7 Martin Peres 2018-06-17 17:57:26 UTC
(In reply to Chris Wilson from comment #6)
> (In reply to Martin Peres from comment #5)
> > p(In reply to Chris Wilson from comment #4)
> > > I applied commit c3160da9a6af0e2d8f4fb3410df9d027a178ca3d
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Thu May 31 09:22:45 2018 +0100
> > > 
> > >     drm/i915: After reset on sanitization, reset the engine backends
> > >     
> > >     As we reset the GPU on suspend/resume, we also do need to reset the
> > >     engine state tracking so call into the engine backends. This is
> > >     especially important so that we can also sanitize the state tracking
> > >     across resume.
> > >     
> > >     References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
> > >     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > >     Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > >     Link:
> > > https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-3-
> > > chris@chris-wilson.co.uk
> > > 
> > > which I claim to be sufficient to prevent this BUG().
> > 
> > Your claim did not hold up to reality as it is still happening at every
> > single run... try again?
> 
> What are you talking about?

I meant that the patch apparently was not sufficient, as we still have this problem :s I'll provide you with links tomorrow if you need me to :)
Comment 8 Martin Peres 2018-09-18 08:46:13 UTC
(In reply to Martin Peres from comment #7)
> (In reply to Chris Wilson from comment #6)
> > (In reply to Martin Peres from comment #5)
> > > p(In reply to Chris Wilson from comment #4)
> > > > I applied commit c3160da9a6af0e2d8f4fb3410df9d027a178ca3d
> > > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Date:   Thu May 31 09:22:45 2018 +0100
> > > > 
> > > >     drm/i915: After reset on sanitization, reset the engine backends
> > > >     
> > > >     As we reset the GPU on suspend/resume, we also do need to reset the
> > > >     engine state tracking so call into the engine backends. This is
> > > >     especially important so that we can also sanitize the state tracking
> > > >     across resume.
> > > >     
> > > >     References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
> > > >     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > >     Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > > >     Link:
> > > > https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-3-
> > > > chris@chris-wilson.co.uk
> > > > 
> > > > which I claim to be sufficient to prevent this BUG().
> > > 
> > > Your claim did not hold up to reality as it is still happening at every
> > > single run... try again?
> > 
> > What are you talking about?
> 
> I meant that the patch apparently was not sufficient, as we still have this
> problem :s I'll provide you with links tomorrow if you need me to :)

Better late than never:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_110/fi-kbl-r/igt@gem_eio@in-flight-suspend.html

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_110/fi-kbl-x1275/igt@gem_eio@in-flight-suspend.html
Comment 9 Martin Peres 2018-10-23 13:46:15 UTC
Also seen on ICL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1980/fi-icl-u/igt%40drv_selftest%40live_contexts.html

<3> [509.859079] process_csb:953 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)
<4> [509.859207] ------------[ cut here ]------------
<2> [509.859209] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:953!
<4> [509.859217] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4> [509.859220] CPU: 2 PID: 4657 Comm: drv_selftest Tainted: G     U  W         4.19.0-rc8-CI-CI_DRM_5020+ #1
<4> [509.859222] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.2392.A04.1809260455 09/26/2018
<4> [509.859283] RIP: 0010:process_csb+0x5c6/0x790 [i915]
<4> [509.859286] Code: 69 87 b9 e0 48 8b 35 99 f9 19 00 49 c7 c0 e0 7b 66 a0 b9 b9 03 00 00 48 c7 c2 10 fc 62 a0 48 c7 c7 e1 18 56 a0 e8 3a 17 c0 e0 <0f> 0b 48 c7 c1 28 9b 64 a0 ba bb 03 00 00 48 c7 c6 10 fc 62 a0 48
<4> [509.859288] RSP: 0018:ffff8804afe83e20 EFLAGS: 00010082
<4> [509.859291] RAX: 000000000000000e RBX: ffff8804aa16c2a8 RCX: 0000000000000000
<4> [509.859293] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8804ae250aa8
<4> [509.859295] RBP: ffff8804afe83e88 R08: 00000000009ccca1 R09: ffff8804ae3f4000
<4> [509.859297] R10: 0000000000000001 R11: ffff8804ae250aa8 R12: ffff8804a122c05c
<4> [509.859299] R13: 0000000000000003 R14: ffff880425d896c0 R15: ffff8804a122c040
<4> [509.859301] FS:  00007f53c3ea6980(0000) GS:ffff8804afe80000(0000) knlGS:0000000000000000
<4> [509.859303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [509.859305] CR2: 00007f7f12bc0140 CR3: 000000047cc10005 CR4: 0000000000760ee0
<4> [509.859307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [509.859309] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [509.859311] PKRU: 55555554
<4> [509.859312] Call Trace:
<4> [509.859315]  <IRQ>
<4> [509.859360]  __execlists_submission_tasklet+0x2c/0xc20 [i915]
<4> [509.859397]  execlists_submission_tasklet+0x46/0x60 [i915]
<4> [509.859403]  tasklet_action_common.isra.5+0x47/0xb0
<4> [509.859408]  __do_softirq+0xd8/0x483
<4> [509.859412]  ? _raw_spin_unlock+0x29/0x40
<4> [509.859415]  irq_exit+0xa9/0xc0
<4> [509.859418]  do_IRQ+0x9a/0x120
<4> [509.859422]  common_interrupt+0xf/0xf
<4> [509.859424]  </IRQ>
<4> [509.859427] RIP: 0010:_raw_spin_unlock_irqrestore+0x4e/0x60
<4> [509.859429] Code: c7 02 75 1f 53 9d e8 d1 28 82 ff bf 01 00 00 00 e8 27 17 77 ff 65 8b 05 e0 3b 6d 7e 85 c0 74 0c 5b 5d c3 e8 c4 26 82 ff 53 9d <eb> df e8 85 06 6c ff 5b 5d c3 0f 1f 84 00 00 00 00 00 53 48 8b 54
<4> [509.859431] RSP: 0018:ffffc90000357868 EFLAGS: 00000282 ORIG_RAX: ffffffffffffffde
<4> [509.859434] RAX: ffff8804610f4040 RBX: 0000000000000282 RCX: 0000000000000006
<4> [509.859436] RDX: 000000000000153b RSI: ffffffff8212508a RDI: ffffffff820d3a9f
<4> [509.859438] RBP: ffff8804a9d41c40 R08: 00000000efd5b9de R09: 0000000000000000
<4> [509.859440] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0010976200
<4> [509.859442] R13: 0000000000000001 R14: ffff880425d88000 R15: 0000000000000001
<4> [509.859449]  free_debug_processing+0x27d/0x380
<4> [509.859489]  ? i915_request_retire_upto+0xfd/0x150 [i915]
<4> [509.859493]  __slab_free+0x33c/0x4f0
<4> [509.859496]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [509.859500]  ? lockdep_hardirqs_on+0xe0/0x1b0
<4> [509.859503]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4> [509.859507]  ? debug_check_no_obj_freed+0x132/0x210
<4> [509.859541]  ? i915_request_retire_upto+0xfd/0x150 [i915]
<4> [509.859544]  ? kmem_cache_free+0x279/0x2e0
<4> [509.859547]  kmem_cache_free+0x279/0x2e0
<4> [509.859577]  i915_request_retire_upto+0xfd/0x150 [i915]
<4> [509.859607]  i915_request_add+0x3ba/0x7e0 [i915]
<4> [509.859650]  live_nop_switch+0x229/0x470 [i915]
<4> [509.859704]  __i915_subtests+0x5e/0xf0 [i915]
<4> [509.859751]  __run_selftests+0x10b/0x190 [i915]
<4> [509.859786]  i915_live_selftests+0x2c/0x60 [i915]
<4> [509.859823]  i915_pci_probe+0x50/0xa0 [i915]
<4> [509.859828]  pci_device_probe+0xa1/0x130
<4> [509.859833]  really_probe+0x25d/0x3c0
<4> [509.859836]  driver_probe_device+0x10a/0x120
<4> [509.859840]  __driver_attach+0xdb/0x100
<4> [509.859843]  ? driver_probe_device+0x120/0x120
<4> [509.859845]  bus_for_each_dev+0x74/0xc0
<4> [509.859849]  bus_add_driver+0x15f/0x250
<4> [509.859851]  ? 0xffffffffa0a0b000
<4> [509.859854]  driver_register+0x56/0xe0
<4> [509.859857]  ? 0xffffffffa0a0b000
<4> [509.859860]  do_one_initcall+0x58/0x2e0
<4> [509.859863]  ? rcu_lockdep_current_cpu_online+0x8f/0xd0
<4> [509.859866]  ? do_init_module+0x1d/0x1ea
<4> [509.859870]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [509.859873]  ? kmem_cache_alloc_trace+0x264/0x290
<4> [509.859876]  do_init_module+0x56/0x1ea
<4> [509.859882]  load_module+0x26f5/0x29d0
<4> [509.859887]  ? vfs_read+0x122/0x140
<4> [509.859893]  ? __se_sys_finit_module+0xd3/0xf0
<4> [509.859896]  __se_sys_finit_module+0xd3/0xf0
<4> [509.859902]  do_syscall_64+0x55/0x190
<4> [509.859905]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [509.859907] RIP: 0033:0x7f53c3770839
<4> [509.859910] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [509.859912] RSP: 002b:00007ffd743b33e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [509.859915] RAX: ffffffffffffffda RBX: 000055e0b973eda0 RCX: 00007f53c3770839
<4> [509.859917] RDX: 0000000000000000 RSI: 000055e0b973fb40 RDI: 0000000000000006
<4> [509.859919] RBP: 000055e0b973fb40 R08: 0000000000000004 R09: 0000000000000000
<4> [509.859921] R10: 00007ffd743b3560 R11: 0000000000000246 R12: 0000000000000000
<4> [509.859923] R13: 000055e0b97387e0 R14: 0000000000000020 R15: 000000000000003c
<4> [509.859928] Modules linked in: i915(+) amdgpu chash gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ax88179_178a usbnet x86_pkg_temp_thermal mii coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core e1000e snd_pcm prime_numbers [last unloaded: i915]
<0> [509.859952] Dumping ftrace buffer:
<0> [509.859954] ---------------------------------
[...]
<0> [509.882872] ---------------------------------
<4> [509.882876] ---[ end trace 96e50b0269c85436 ]---
Comment 10 Chris Wilson 2018-10-23 15:34:37 UTC
Not the same. The chip not being reset across a PCI level suspend is not the same thing as what is happening to icl.
Comment 11 Chris Wilson 2019-02-09 17:28:44 UTC
(In reply to Martin Peres from comment #5)
> Your claim did not hold up to reality as it is still happening at every
> single run... try again?

I missed that this was about the drmtip runs, hence the confusion.


commit 0eb6a3f7ef99e7de19efb1293be0571b1d4e83cd
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Feb 8 15:37:04 2019 +0000

    drm/i915: Force the GPU reset upon wedging
    
    When declaring the GPU wedged, we do need to hit the GPU with the reset
    hammer so that its state matches our presumed state during cleanup. If
    the reset fails, it fails, and we may be unhappy but wedged. However, if
    we are testing our wedge/unwedged handling, the desync carries over into
    the next test and promptly explodes.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-3-chris@chris-wilson.co.uk
Comment 12 Martin Peres 2019-03-06 18:19:00 UTC
(In reply to Chris Wilson from comment #11)
> (In reply to Martin Peres from comment #5)
> > Your claim did not hold up to reality as it is still happening at every
> > single run... try again?
> 
> I missed that this was about the drmtip runs, hence the confusion.
> 
> 
> commit 0eb6a3f7ef99e7de19efb1293be0571b1d4e83cd
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Feb 8 15:37:04 2019 +0000
> 
>     drm/i915: Force the GPU reset upon wedging
>     
>     When declaring the GPU wedged, we do need to hit the GPU with the reset
>     hammer so that its state matches our presumed state during cleanup. If
>     the reset fails, it fails, and we may be unhappy but wedged. However, if
>     we are testing our wedge/unwedged handling, the desync carries over into
>     the next test and promptly explodes.
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-3-
> chris@chris-wilson.co.uk

Thanks! This definitely fixed the issue!
Comment 13 CI Bug Log 2019-03-06 18:19:14 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.