Bug 106373 - [CI] igt@drv_module_reload@basic-reload - dmesg-fail - Failed assertion: __gem_execbuf(fd, execbuf) == 0 && Failed to idle engines, declaring wedged!
Summary: [CI] igt@drv_module_reload@basic-reload - dmesg-fail - Failed assertion: __ge...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-03 12:03 UTC by Martin Peres
Modified: 2018-05-22 20:34 UTC (History)
1 user (show)

See Also:
i915 platform: BSW/CHT, GLK
i915 features: power/Other


Attachments
Kick softirqs harder (6.52 KB, patch)
2018-05-03 12:55 UTC, Chris Wilson
no flags Details | Splinter Review

Description Martin Peres 2018-05-03 12:03:05 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4459/fi-bsw-n3050/igt@drv_module_reload@basic-reload.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4131/fi-bsw-n3050/igt@drv_module_reload@basic-reload.html

Stderr:

(drv_module_reload:3999) ioctl_wrappers-CRITICAL: Test assertion failure function gem_execbuf, file ../lib/ioctl_wrappers.c:604:
(drv_module_reload:3999) ioctl_wrappers-CRITICAL: Failed assertion: __gem_execbuf(fd, execbuf) == 0
(drv_module_reload:3999) ioctl_wrappers-CRITICAL: error: -5 != 0
Subtest basic-reload failed.


Dmesg:

[  604.037258] i915 0000:00:02.0: Failed to idle engines, declaring wedged!
[  604.065263] Dumping ftrace buffer:
[  604.076772] ---------------------------------
[  604.076902] CPU:0 [LOST 10784293 EVENTS]
               prime_vg-3799    0.... 593757581us : reset_all_global_seqno.part.5: vcs0 seqno 0 (current 0) -> 0
[  604.081706] prime_vg-3799    0.... 593757634us : reset_all_global_seqno.part.5: vecs0 seqno 0 (current 0) -> 0
[  604.086555] ksoftirq-8       0d.s1 594774722us : __i915_request_submit: rcs0 fence 1a24:1 -> global=1, current 0
[  604.091317] ksoftirq-8       0..s. 594774748us : execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, global=1 (fence 1a24:1) (current 0), prio=0
[  604.096095] prime_vg-3799    0.Ns1 594775004us : execlists_submission_tasklet: rcs0 cs-irq head=5 [5?], tail=0 [0?]
[  604.100760] prime_vg-3799    0.Ns1 594775012us : execlists_submission_tasklet: rcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
[  604.107342] kworker/-2881    0..s1 594775148us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
[  604.113664] kworker/-2881    0..s1 594775154us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000002, active=0x5
[  604.119958] kworker/-2881    0..s1 594775161us : execlists_submission_tasklet: rcs0 out[0]: ctx=2.1, global=1 (fence 1a24:1) (current 1), prio=0
[  604.126357] kworker/-2881    0..s1 594775168us : execlists_submission_tasklet: rcs0 completed ctx=2
[  604.133314] prime_vg-3799    0.... 594781165us : i915_request_retire: rcs0 fence 1a24:1, global=1, current 1
[  604.140387] prime_vg-3799    0.... 594781550us : i915_request_retire: __retire_engine_request(rcs0) fence 1a24:1, global=1, current 1
[  604.147579] prime_vg-3803    0.... 595124991us : reset_all_global_seqno.part.5: rcs0 seqno 1 (current 1) -> 0
[  604.157882] prime_vg-3803    0.... 595125123us : reset_all_global_seqno.part.5: bcs0 seqno 0 (current 0) -> 0
[  604.163367] prime_vg-3803    0.... 595125132us : reset_all_global_seqno.part.5: vcs0 seqno 0 (current 0) -> 0
[  604.169161] prime_vg-3803    0.... 595125168us : reset_all_global_seqno.part.5: vecs0 seqno 0 (current 0) -> 0
[  604.175046] prime_vg-3806    0.... 595475051us : reset_all_global_seqno.part.5: rcs0 seqno 0 (current 0) -> 0
[  604.181123] prime_vg-3806    0.... 595475169us : reset_all_global_seqno.part.5: bcs0 seqno 0 (current 0) -> 0
[  604.187317] prime_vg-3806    0.... 595475178us : reset_all_global_seqno.part.5: vcs0 seqno 0 (current 0) -> 0
[  604.193802] prime_vg-3806    0.... 595475213us : reset_all_global_seqno.part.5: vecs0 seqno 0 (current 0) -> 0
[  604.200506] prime_vg-3809    0.... 595825811us : reset_all_global_seqno.part.5: rcs0 seqno 0 (current 0) -> 0
[  604.207464] prime_vg-3809    0.... 595826022us : reset_all_global_seqno.part.5: bcs0 seqno 0 (current 0) -> 0
[  604.214519] prime_vg-3809    0.... 595826032us : reset_all_global_seqno.part.5: vcs0 seqno 0 (current 0) -> 0
[  604.225001] prime_vg-3809    0.... 595826071us : reset_all_global_seqno.part.5: vecs0 seqno 0 (current 0) -> 0
[  604.232287] prime_vg-3809    0.... 595834175us : __i915_request_add: rcs0 fence 1a25:1
[  604.239725] prime_vg-3809    0d.s2 595834197us : __i915_request_submit: rcs0 fence 1a25:1 -> global=1, current 0
[  604.247446] prime_vg-3809    0..s1 595834207us : execlists_submission_tasklet: rcs0 in[0]:  ctx=1.1, global=1 (fence 1a25:1) (current 0), prio=0
[  604.255374] prime_vg-3809    0..s. 595834313us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?]
[  604.263402] prime_vg-3809    0..s. 595834316us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[  604.271677]   <idle>-0       0.Ns1 595834654us : execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?]
[  604.280103]   <idle>-0       0.Ns1 595834656us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00000018:0x00000001, active=0x5
[  604.288791]   <idle>-0       0.Ns1 595834658us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.1, global=1 (fence 1a25:1) (current 1), prio=0
[  604.297623]   <idle>-0       0.Ns1 595834659us : execlists_submission_tasklet: rcs0 completed ctx=1
[  604.306643] prime_vg-3809    0.... 595836056us : i915_request_retire: rcs0 fence 1a25:1, global=1, current 1
[  604.315898] prime_vg-3809    0.... 595836112us : i915_request_retire: __retire_engine_request(rcs0) fence 1a25:1, global=1, current 1
[  604.325402] systemd--175     0..s. 596168393us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
[  604.334988] systemd--175     0..s. 596168401us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000001:0x00000000, active=0x1
[  604.344538] rcu_pree-9       0.Ns2 596168618us : execlists_submission_tasklet: rcs0 cs-irq head=4 [4?], tail=5 [5?]
[  604.354238] rcu_pree-9       0.Ns2 596168624us : execlists_submission_tasklet: rcs0 csb[5]: status=0x00000018:0x00000002, active=0x5
[  604.364170] rcu_pree-9       0.Ns2 596168629us : execlists_submission_tasklet: rcs0 out[0]: ctx=2.1, global=1 (fence 1a26:1) (current 1), prio=0
[  604.394885] rcu_pree-9       0.Ns2 596168633us : execlists_submission_tasklet: rcs0 completed ctx=2
[  604.408112] drv_modu-3861    0.... 600730121us : __i915_request_add: rcs0 fence 5:8
[  604.419231] drv_modu-3861    0d.s2 600730157us : __i915_request_submit: rcs0 fence 5:8 -> global=1, current 0
[  604.430131] drv_modu-3861    0..s1 600730176us : execlists_submission_tasklet: rcs0 in[0]:  ctx=0.1, global=1 (fence 5:8) (current 0), prio=-1024
[  604.441270] drv_modu-3861    0..s1 600730225us : execlists_submission_tasklet: rcs0 cs-irq head=5 [5?], tail=0 [0?]
[  604.452664] drv_modu-3861    0..s1 600730227us : execlists_submission_tasklet: rcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
[  604.464190] ksoftirq-8       0..s. 600730319us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
[  604.475818] ksoftirq-8       0..s. 600730320us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000, active=0x5
[  604.487574] ksoftirq-8       0..s. 600730322us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, global=1 (fence 5:8) (current 1), prio=-1024
[  604.499482] ksoftirq-8       0..s. 600730324us : execlists_submission_tasklet: rcs0 completed ctx=0
[  604.511537] drv_modu-3861    0.... 600730335us : i915_request_retire: rcs0 fence 5:8, global=1, current 1
[  604.523765] drv_modu-3861    0.... 600730338us : i915_request_retire: __retire_engine_request(rcs0) fence 5:8, global=1, current 1
[  604.536258] drv_modu-3861    0.... 600730666us : intel_gpu_reset: engine_mask=ffffffff
[  604.548877] drv_modu-3861    0.... 601103025us : 0xffffffffa053ef24: engine_mask=ffffffff
[  604.561637] CPU:1 [LOST 4097146 EVENTS]
               drv_modu-3861    1.... 601928841us : 0xffffffffa050ec87: __retire_engine_request(vecs0) fence 1a3e:8, global=10, current 10
[  604.574438] drv_modu-3861    1.... 601928869us : 0xffffffffa0510d60: vecs0 fence 1a3e:9
[  604.574448] drv_modu-3861    1d.s2 601928877us : 0xffffffffa050f870: vecs0 fence 1a3e:9 -> global=11, current 10
[  604.587301] drv_modu-3861    1..s1 601928881us : 0xffffffffa052fbd5: vecs0 in[0]:  ctx=1.1, global=11 (fence 1a3e:9) (current 10), prio=0
[  604.600362] drv_modu-3861    1..s. 601928902us : 0xffffffffa052ecb7: vecs0 cs-irq head=1 [1?], tail=3 [3?]
[  604.613595] drv_modu-3861    1..s. 601928904us : 0xffffffffa052ed45: vecs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[  604.626886] drv_modu-3861    1..s. 601928904us : 0xffffffffa052ed45: vecs0 csb[3]: status=0x00000018:0x00000001, active=0x5
[  604.640224] drv_modu-3861    1..s. 601928906us : 0xffffffffa052ee24: vecs0 out[0]: ctx=1.1, global=11 (fence 1a3e:9) (current 11), prio=0
[  604.653606] drv_modu-3861    1..s. 601928907us : 0xffffffffa052ef50: vecs0 completed ctx=1
[  604.653616] drv_modu-3861    1.... 601928958us : 0xffffffffa050f720: vecs0 fence 1a3e:9, global=11, current 11
[  604.666940] drv_modu-3861    1.... 601928960us : 0xffffffffa050ea00: vecs0 fence 1a3e:9, global=11, current 11
[  604.680212] drv_modu-3861    1.... 601928962us : 0xffffffffa050ec87: __retire_engine_request(vecs0) fence 1a3e:9, global=11, current 11
[  604.693391] drv_modu-3861    1.... 601929169us : 0xffffffffa050ea00: bcs0 fence 1a3d:9, global=11, current 11
[  604.706575] drv_modu-3861    1.... 601929171us : 0xffffffffa050ec87: __retire_engine_request(bcs0) fence 1a3d:9, global=11, current 11
[  604.719726] drv_modu-3861    1.... 601929184us : 0xffffffffa050ea00: rcs0 fence 1a3b:18, global=20, current 20
[  604.732846] drv_modu-3861    1.... 601929186us : 0xffffffffa050ec87: __retire_engine_request(rcs0) fence 1a3b:18, global=20, current 20
[  604.746035] drv_modu-3861    1.... 601929195us : 0xffffffffa050ea00: vcs0 fence 1a3c:9, global=11, current 11
[  604.759250] drv_modu-3861    1.... 601929289us : 0xffffffffa050ec87: __retire_engine_request(vcs0) fence 1a3c:9, global=11, current 11
[  604.772556] drv_modu-3861    0.... 602834339us : 0xffffffffa0510d60: rcs0 fence 1a30:2
[  604.772567] drv_modu-3861    0d.s2 602834374us : 0xffffffffa050f870: rcs0 fence 1a30:2 -> global=21, current 20
[  604.786012] drv_modu-3861    0..s1 602834392us : 0xffffffffa052fbd5: rcs0 in[0]:  ctx=0.1, global=21 (fence 1a30:2) (current 20), prio=-1024
[  604.799760] drv_modu-3861    0.... 602834413us : 0xffffffffa0510d60: bcs0 fence 1a32:2
[  604.799770] drv_modu-3861    0d.s2 602834430us : 0xffffffffa050f870: bcs0 fence 1a32:2 -> global=12, current 11
[  604.813608] drv_modu-3861    0..s1 602834444us : 0xffffffffa052fbd5: bcs0 in[0]:  ctx=0.1, global=12 (fence 1a32:2) (current 11), prio=-1024
[  604.827462] drv_modu-3861    0.... 602834463us : 0xffffffffa0510d60: vcs0 fence 1a34:2
[  604.827472] systemd--3887    1..s. 602834478us : 0xffffffffa052ecb7: rcs0 cs-irq head=3 [3?], tail=4 [4?]
[  604.841401] drv_modu-3861    0d.s2 602834480us : 0xffffffffa050f870: vcs0 fence 1a34:2 -> global=12, current 11
[  604.841411] systemd--3887    1..s. 602834490us : 0xffffffffa052ed45: rcs0 csb[4]: status=0x00000001:0x00000000, active=0x1
[  604.862305] drv_modu-3861    0..s1 602834492us : 0xffffffffa052fbd5: vcs0 in[0]:  ctx=0.1, global=12 (fence 1a34:2) (current 11), prio=-1024
[  604.876391] systemd--3887    1..s. 602834493us : 0xffffffffa052ecb7: rcs0 cs-irq head=4 [4?], tail=5 [5?]
[  604.890434] drv_modu-3861    0.... 602834511us : 0xffffffffa0510d60: vecs0 fence 1a36:2
[  604.890444] systemd--3887    1..s. 602834512us : 0xffffffffa052ed45: rcs0 csb[5]: status=0x00000018:0x00000000, active=0x5
[  604.904568] systemd--3887    1..s. 602834514us : 0xffffffffa052ee24: rcs0 out[0]: ctx=0.1, global=21 (fence 1a30:2) (current 21), prio=-1024
[  604.904577] systemd--3887    1..s. 602834517us : 0xffffffffa052ef50: rcs0 completed ctx=0
[  605.268840] systemd--3887    1..s. 602834521us : 0xffffffffa052ecb7: bcs0 cs-irq head=3 [3?], tail=5 [5?]
[  605.268850] systemd--3887    1..s. 602834522us : 0xffffffffa052ed45: bcs0 csb[4]: status=0x00000001:0x00000000, active=0x1
[  605.268860] systemd--3887    1..s. 602834523us : 0xffffffffa052ed45: bcs0 csb[5]: status=0x00000018:0x00000000, active=0x5
[  605.268870] systemd--3887    1..s. 602834525us : 0xffffffffa052ee24: bcs0 out[0]: ctx=0.1, global=12 (fence 1a32:2) (current 12), prio=-1024
[  605.268878] systemd--3887    1..s. 602834526us : 0xffffffffa052ef50: bcs0 completed ctx=0
[  605.268887] drv_modu-3861    0d.s2 602834527us : 0xffffffffa050f870: vecs0 fence 1a36:2 -> global=12, current 11
[  605.268897] systemd--3887    1..s. 602834530us : 0xffffffffa052ecb7: vcs0 cs-irq head=3 [3?], tail=5 [5?]
[  605.268906] systemd--3887    1..s. 602834531us : 0xffffffffa052ed45: vcs0 csb[4]: status=0x00000001:0x00000000, active=0x1
[  605.268915] systemd--3887    1..s. 602834532us : 0xffffffffa052ed45: vcs0 csb[5]: status=0x00000018:0x00000000, active=0x5
[  605.268924] systemd--3887    1..s. 602834533us : 0xffffffffa052ee24: vcs0 out[0]: ctx=0.1, global=12 (fence 1a34:2) (current 12), prio=-1024
[  605.268933] systemd--3887    1..s. 602834534us : 0xffffffffa052ef50: vcs0 completed ctx=0
[  605.268943] drv_modu-3861    0..s1 602834540us : 0xffffffffa052fbd5: vecs0 in[0]:  ctx=0.1, global=12 (fence 1a36:2) (current 11), prio=-1024
[  605.268952] drv_modu-3861    0.... 602834549us : 0xffffffffa050ea00: vecs0 fence 1a36:2, global=12, current 12
[  605.268961] drv_modu-3861    0.... 602834552us : 0xffffffffa050ec87: __retire_engine_request(vecs0) fence 1a36:2, global=12, current 12
[  605.268970] drv_modu-3861    0.... 602834561us : 0xffffffffa050ea00: vcs0 fence 1a34:2, global=12, current 12
[  605.268980] drv_modu-3861    0.... 602834563us : 0xffffffffa050ec87: __retire_engine_request(vcs0) fence 1a34:2, global=12, current 12
[  605.268989] systemd--3887    1..s. 602834572us : 0xffffffffa052ecb7: vecs0 cs-irq head=3 [3?], tail=5 [5?]
[  605.268998] systemd--3887    1..s. 602834573us : 0xffffffffa052ed45: vecs0 csb[4]: status=0x00000001:0x00000000, active=0x1
[  605.269007] systemd--3887    1..s. 602834574us : 0xffffffffa052ed45: vecs0 csb[5]: status=0x00000018:0x00000000, active=0x5
[  605.269017] systemd--3887    1..s. 602834576us : 0xffffffffa052ee24: vecs0 out[0]: ctx=0.1, global=12 (fence 1a36:2) (current 12), prio=-1024
[  605.269026] drv_modu-3861    0.... 602834579us : 0xffffffffa050ea00: bcs0 fence 1a32:2, global=12, current 12
[  605.269036] drv_modu-3861    0.... 602834582us : 0xffffffffa050ec87: __retire_engine_request(bcs0) fence 1a32:2, global=12, current 12
[  605.269045] systemd--3887    1..s. 602834598us : 0xffffffffa052ef50: vecs0 completed ctx=0
[  605.269054] drv_modu-3861    0.... 602834605us : 0xffffffffa050ea00: rcs0 fence 1a30:2, global=21, current 21
[  605.269063] drv_modu-3861    0.... 602834606us : 0xffffffffa050ec87: __retire_engine_request(rcs0) fence 1a30:2, global=21, current 21
[  605.269071] drv_modu-3861    0.... 602836550us : 0xffffffffa053ef24: engine_mask=ffffffff
[  605.269160] drv_modu-3861    0.... 603127421us : intel_gpu_reset: engine_mask=ffffffff
[  605.269237] drv_modu-3861    1.... 603210067us : __i915_request_add: rcs0 fence 1a4b:1
[  605.269315] drv_modu-3861    1d.s2 603210082us : __i915_request_submit: rcs0 fence 1a4b:1 -> global=1, current 0
[  605.269395] drv_modu-3861    1..s1 603210089us : execlists_submission_tasklet: rcs0 in[0]:  ctx=1.1, global=1 (fence 1a4b:1) (current 0), prio=0
[  605.269474] drv_modu-3861    1..s. 603210110us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0], tail=0 [0]
[  605.269553] drv_modu-3861    1..s. 603210146us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
[  605.269633] drv_modu-3861    1..s. 603210148us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000001, active=0x1
[  605.269712] drv_modu-3861    1..s. 603210149us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.1, global=1 (fence 1a4b:1) (current 1), prio=0
[  605.269791] drv_modu-3861    1..s. 603210151us : execlists_submission_tasklet: rcs0 completed ctx=1
[  605.269868] drv_modu-3861    1.... 603210469us : __i915_request_add: bcs0 fence 1a4c:1
[  605.269945] drv_modu-3861    1d.s2 603210478us : __i915_request_submit: bcs0 fence 1a4c:1 -> global=1, current 0
[  605.270025] drv_modu-3861    1..s1 603210483us : execlists_submission_tasklet: bcs0 in[0]:  ctx=1.1, global=1 (fence 1a4c:1) (current 0), prio=0
[  605.270105] drv_modu-3861    1..s. 603210513us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
[  605.270184] drv_modu-3861    1..s. 603210514us : execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000001, active=0x1
[  605.270264] drv_modu-3861    1..s. 603210516us : execlists_submission_tasklet: bcs0 out[0]: ctx=1.1, global=1 (fence 1a4c:1) (current 1), prio=0
[  605.270343] drv_modu-3861    1..s. 603210517us : execlists_submission_tasklet: bcs0 completed ctx=1
[  605.270420] drv_modu-3861    1.... 603210763us : __i915_request_add: vcs0 fence 1a4d:1
[  605.270497] drv_modu-3861    1d.s2 603210771us : __i915_request_submit: vcs0 fence 1a4d:1 -> global=1, current 0
[  605.270577] drv_modu-3861    1..s1 603210775us : execlists_submission_tasklet: vcs0 in[0]:  ctx=1.1, global=1 (fence 1a4d:1) (current 0), prio=0
[  605.270656] drv_modu-3861    1..s. 603210806us : execlists_submission_tasklet: vcs0 cs-irq head=0 [0], tail=1 [1]
[  605.270736] drv_modu-3861    1..s. 603210808us : execlists_submission_tasklet: vcs0 csb[1]: status=0x00000018:0x00000001, active=0x1
[  605.270816] drv_modu-3861    1..s. 603210809us : execlists_submission_tasklet: vcs0 out[0]: ctx=1.1, global=1 (fence 1a4d:1) (current 1), prio=0
[  605.270894] drv_modu-3861    1..s. 603210810us : execlists_submission_tasklet: vcs0 completed ctx=1
[  605.270971] drv_modu-3861    1.... 603211053us : __i915_request_add: vecs0 fence 1a4e:1
[  605.271049] drv_modu-3861    1d.s2 603211062us : __i915_request_submit: vecs0 fence 1a4e:1 -> global=1, current 0
[  605.271129] drv_modu-3861    1..s1 603211070us : execlists_submission_tasklet: vecs0 in[0]:  ctx=1.1, global=1 (fence 1a4e:1) (current 0), prio=0
[  605.271209] drv_modu-3861    1..s1 603211096us : execlists_submission_tasklet: vecs0 cs-irq head=0 [0], tail=1 [1]
[  605.271288] drv_modu-3861    1..s1 603211097us : execlists_submission_tasklet: vecs0 csb[1]: status=0x00000018:0x00000001, active=0x1
[  605.271368] drv_modu-3861    1..s1 603211099us : execlists_submission_tasklet: vecs0 out[0]: ctx=1.1, global=1 (fence 1a4e:1) (current 1), prio=0
[  605.271447] drv_modu-3861    1..s1 603211100us : execlists_submission_tasklet: vecs0 completed ctx=1
[  605.271524] drv_modu-3861    1.... 603211102us : i915_request_retire: vcs0 fence 1a4d:1, global=1, current 1
[  605.271602] drv_modu-3861    1.... 603211104us : i915_request_retire: __retire_engine_request(vcs0) fence 1a4d:1, global=1, current 1
[  605.271680] drv_modu-3861    1.... 603211121us : i915_request_retire: bcs0 fence 1a4c:1, global=1, current 1
[  605.271758] drv_modu-3861    1.... 603211123us : i915_request_retire: __retire_engine_request(bcs0) fence 1a4c:1, global=1, current 1
[  605.271836] drv_modu-3861    1.... 603211133us : i915_request_retire: rcs0 fence 1a4b:1, global=1, current 1
[  605.271913] drv_modu-3861    1.... 603211162us : i915_request_retire: __retire_engine_request(rcs0) fence 1a4b:1, global=1, current 1
[  605.271990] drv_modu-3861    1.... 603211191us : __i915_request_add: rcs0 fence 1a44:1
[  605.272068] drv_modu-3861    1d.s2 603211218us : __i915_request_submit: rcs0 fence 1a44:1 -> global=2, current 1
[  605.272148] drv_modu-3861    1..s1 603211234us : execlists_submission_tasklet: rcs0 in[0]:  ctx=0.1, global=2 (fence 1a44:1) (current 1), prio=-1024
[  605.272227] drv_modu-3861    1..s. 603211272us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=3 [3?]
[  605.272307] drv_modu-3861    1..s. 603211273us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[  605.272386] drv_modu-3861    1..s. 603211274us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00000018:0x00000000, active=0x5
[  605.272466] drv_modu-3861    1..s. 603211275us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, global=2 (fence 1a44:1) (current 2), prio=-1024
[  605.272545] drv_modu-3861    1..s. 603211276us : execlists_submission_tasklet: rcs0 completed ctx=0
[  605.272622] drv_modu-3861    1.... 603211297us : __i915_request_add: bcs0 fence 1a46:1
[  605.272699] drv_modu-3861    1d.s2 603211314us : __i915_request_submit: bcs0 fence 1a46:1 -> global=2, current 1
[  605.272779] drv_modu-3861    1..s1 603211327us : execlists_submission_tasklet: bcs0 in[0]:  ctx=0.1, global=2 (fence 1a46:1) (current 1), prio=-1024
[  605.272858] drv_modu-3861    1..s. 603211347us : execlists_submission_tasklet: bcs0 cs-irq head=1 [1?], tail=3 [3?]
[  605.272938] drv_modu-3861    1..s. 603211349us : execlists_submission_tasklet: bcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[  605.273017] drv_modu-3861    1..s. 603211350us : execlists_submission_tasklet: bcs0 csb[3]: status=0x00000018:0x00000000, active=0x5
[  605.273097] drv_modu-3861    1..s. 603211351us : execlists_submission_tasklet: bcs0 out[0]: ctx=0.1, global=2 (fence 1a46:1) (current 2), prio=-1024
[  605.273175] drv_modu-3861    1..s. 603211352us : execlists_submission_tasklet: bcs0 completed ctx=0
[  605.273252] drv_modu-3861    1.... 603211372us : __i915_request_add: vcs0 fence 1a48:1
[  605.273330] drv_modu-3861    1d.s2 603211388us : __i915_request_submit: vcs0 fence 1a48:1 -> global=2, current 1
[  605.273410] drv_modu-3861    1..s1 603211401us : execlists_submission_tasklet: vcs0 in[0]:  ctx=0.1, global=2 (fence 1a48:1) (current 1), prio=-1024
[  605.273489] drv_modu-3861    1..s. 603211429us : execlists_submission_tasklet: vcs0 cs-irq head=1 [1?], tail=3 [3?]
[  605.273569] drv_modu-3861    1..s. 603211430us : execlists_submission_tasklet: vcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[  605.273648] drv_modu-3861    1..s. 603211431us : execlists_submission_tasklet: vcs0 csb[3]: status=0x00000018:0x00000000, active=0x5
[  605.273728] drv_modu-3861    1..s. 603211432us : execlists_submission_tasklet: vcs0 out[0]: ctx=0.1, global=2 (fence 1a48:1) (current 2), prio=-1024
[  605.273806] drv_modu-3861    1..s. 603211433us : execlists_submission_tasklet: vcs0 completed ctx=0
[  605.273883] drv_modu-3861    1.... 603211447us : __i915_request_add: vecs0 fence 1a4a:1
[  605.273961] drv_modu-3861    1d.s2 603211463us : __i915_request_submit: vecs0 fence 1a4a:1 -> global=2, current 1
[  605.274041] drv_modu-3861    1..s1 603211476us : execlists_submission_tasklet: vecs0 in[0]:  ctx=0.1, global=2 (fence 1a4a:1) (current 1), prio=-1024
[  605.274120] drv_modu-3861    1..s. 603211514us : execlists_submission_tasklet: vecs0 cs-irq head=1 [1?], tail=3 [3?]
[  605.274200] drv_modu-3861    1..s. 603211515us : execlists_submission_tasklet: vecs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[  605.274279] drv_modu-3861    1..s. 603211516us : execlists_submission_tasklet: vecs0 csb[3]: status=0x00000018:0x00000000, active=0x5
[  605.274359] drv_modu-3861    1..s. 603211518us : execlists_submission_tasklet: vecs0 out[0]: ctx=0.1, global=2 (fence 1a4a:1) (current 2), prio=-1024
[  605.274438] drv_modu-3861    1..s. 603211518us : execlists_submission_tasklet: vecs0 completed ctx=0
[  605.274515] drv_modu-3861    1.... 603211523us : i915_request_retire: vecs0 fence 1a4a:1, global=2, current 2
[  605.274593] drv_modu-3861    1.... 603211524us : i915_request_retire: __retire_engine_request(vecs0) fence 1a4e:1, global=1, current 2
[  605.274670] drv_modu-3861    1.... 603211527us : i915_request_retire: __retire_engine_request(vecs0) fence 1a4a:1, global=2, current 2
[  605.274748] drv_modu-3861    1.... 603211538us : i915_request_retire: vcs0 fence 1a48:1, global=2, current 2
[  605.274825] drv_modu-3861    1.... 603211540us : i915_request_retire: __retire_engine_request(vcs0) fence 1a48:1, global=2, current 2
[  605.274903] drv_modu-3861    1.... 603211553us : i915_request_retire: bcs0 fence 1a46:1, global=2, current 2
[  605.274981] drv_modu-3861    1.... 603211554us : i915_request_retire: __retire_engine_request(bcs0) fence 1a46:1, global=2, current 2
[  605.275058] drv_modu-3861    1.... 603211565us : i915_request_retire: rcs0 fence 1a44:1, global=2, current 2
[  605.275136] drv_modu-3861    1.... 603211567us : i915_request_retire: __retire_engine_request(rcs0) fence 1a44:1, global=2, current 2
[  605.275213] drv_modu-3861    1.... 603211578us : i915_request_retire: vecs0 fence 1a4e:1, global=1, current 2
[  605.275290] drv_modu-3861    0.... 603834687us : __i915_request_add: rcs0 fence 1a4f:1
[  605.275368] ksoftirq-8       0d.s1 603839868us : __i915_request_submit: rcs0 fence 1a4f:1 -> global=3, current 2
[  605.275447] ksoftirq-8       0..s. 603839884us : execlists_submission_tasklet: rcs0 in[0]:  ctx=1.1, global=3 (fence 1a4f:1) (current 2), prio=0
[  605.275527] kworker/-107     1..s. 603839904us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
[  605.275606] kworker/-107     1..s. 603839908us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000001:0x00000000, active=0x1
[  605.275686] kworker/-107     1..s1 603840094us : execlists_submission_tasklet: rcs0 cs-irq head=4 [4?], tail=5 [5?]
[  605.275765] kworker/-107     1..s1 603840096us : execlists_submission_tasklet: rcs0 csb[5]: status=0x00000018:0x00000001, active=0x5
[  605.275845] kworker/-107     1..s1 603840098us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.1, global=3 (fence 1a4f:1) (current 3), prio=0
[  605.275924] kworker/-107     1..s1 603840100us : execlists_submission_tasklet: rcs0 completed ctx=1
[  605.276001] drv_modu-3861    0.... 603844148us : i915_request_retire: rcs0 fence 1a4f:1, global=3, current 3
[  605.276079] drv_modu-3861    0.... 603844269us : i915_request_retire: __retire_engine_request(rcs0) fence 1a4f:1, global=3, current 3
[  605.276156] drv_modu-3861    0.... 603887660us : __i915_request_add: rcs0 fence 1a4f:2
[  605.276233] ksoftirq-8       0d.s1 603887928us : __i915_request_submit: rcs0 fence 1a4f:2 -> global=4, current 3
[  605.276313] ksoftirq-8       0..s. 603887943us : execlists_submission_tasklet: rcs0 in[0]:  ctx=1.1, global=4 (fence 1a4f:2) (current 3), prio=0
[  605.276390] drv_modu-3861    0.... 603889166us : i915_request_retire_upto: rcs0 fence 1a4f:2, global=4, current 4
[  605.276468] drv_modu-3861    0.... 603889168us : i915_request_retire: rcs0 fence 1a4f:2, global=4, current 4
[  605.276545] drv_modu-3861    0.... 603889213us : i915_request_retire: __retire_engine_request(rcs0) fence 1a4f:2, global=4, current 4
[  605.276625] ksoftirq-18      1..s. 604093458us : execlists_submission_tasklet: rcs0 cs-irq head=5 [5?], tail=1 [1?]
[  605.276704] ksoftirq-18      1..s. 604093461us : execlists_submission_tasklet: rcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
[  605.276783] ksoftirq-18      1..s. 604093462us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000001, active=0x5
[  605.276862] ksoftirq-18      1..s. 604093463us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.1, global=4 (fence 1a4f:2) (current 4), prio=0
[  605.276940] ksoftirq-18      1..s. 604093482us : execlists_submission_tasklet: rcs0 completed ctx=1
[  605.276942] ---------------------------------
Comment 1 Chris Wilson 2018-05-03 12:55:34 UTC
Created attachment 139306 [details] [review]
Kick softirqs harder

In both cases it is the 200+ms latency caused by ksoftirqd, falling afoul of our 200ms timeout before declaring bankruptcy. Our loop uses usleep and schedules, so it is not like ksoftirqd doesn't have the opportunity to run. I've been very tempted to execute the tasklet directly, but that doesn't solve the issue that we may encounter the 200ms delay while trying to execute things on the GPU.

This is one of the reasons why we need patches like https://patchwork.freedesktop.org/patch/219353/

What I am using at the moment is attached.
Comment 2 Martin Peres 2018-05-03 14:58:47 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4129/shard-glk1/igt@drv_module_reload@basic-reload-inject.html

Stderr:
(drv_module_reload:7296) ioctl_wrappers-CRITICAL: Test assertion failure function gem_execbuf, file ../lib/ioctl_wrappers.c:604:
(drv_module_reload:7296) ioctl_wrappers-CRITICAL: Failed assertion: __gem_execbuf(fd, execbuf) == 0
(drv_module_reload:7296) ioctl_wrappers-CRITICAL: error: -5 != 0
Test drv_module_reload failed.

Dmesg:
[ 1513.231729] i915 0000:00:02.0: Failed to idle engines, declaring wedged!

Then, the following test produced the following assert.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4129/shard-glk1/igt@gem_render_copy_redux@flink-interruptible.html

gem_render_copy_redux: ../lib/rendercopy_gen9.c:143: gen6_render_flush: Assertion `ret == 0' failed.
Received signal SIGABRT.

This is similar to https://bugs.freedesktop.org/show_bug.cgi?id=106064.
Comment 3 Martin Peres 2018-05-03 16:15:48 UTC
Also seen on igt@drv_module_reload@basic-no-display:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4134/fi-bsw-n3050/igt@drv_module_reload@basic-no-display.html
Comment 4 Chris Wilson 2018-05-19 14:32:17 UTC
Should be improved by

commit dd0cf235d81f24c1ba80c4a000bafc9f2dce3840
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun May 6 18:13:28 2018 +0100

    drm/i915: Speed up idle detection by kicking the tasklets
    
    We rely on ksoftirqd to run in a timely fashion in order to drain the
    execlists queue. Quite frequently, it does not. In some cases we may see
    latencies of over 200ms triggering our idle timeouts and forcing us to
    declare the driver wedged!
    
    Thus we can speed up idle detection by bypassing ksoftirqd in these
    cases and flush our tasklet to confirm if we are indeed still waiting
    for the ELSP to drain.
    
    v2: Put the execlists.first check back; it is required for handling
    reset!
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180506171328.30034-1-chris@chris-wilson.co.uk
Comment 5 Martin Peres 2018-05-22 20:34:23 UTC
(In reply to Chris Wilson from comment #4)
> Should be improved by
> 
> commit dd0cf235d81f24c1ba80c4a000bafc9f2dce3840
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Sun May 6 18:13:28 2018 +0100
> 
>     drm/i915: Speed up idle detection by kicking the tasklets
>     
>     We rely on ksoftirqd to run in a timely fashion in order to drain the
>     execlists queue. Quite frequently, it does not. In some cases we may see
>     latencies of over 200ms triggering our idle timeouts and forcing us to
>     declare the driver wedged!
>     
>     Thus we can speed up idle detection by bypassing ksoftirqd in these
>     cases and flush our tasklet to confirm if we are indeed still waiting
>     for the ELSP to drain.
>     
>     v2: Put the execlists.first check back; it is required for handling
>     reset!
>     
>     References: https://bugs.freedesktop.org/show_bug.cgi?id=106373
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>     Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20180506171328.30034-1-
> chris@chris-wilson.co.uk

Seems like it did the trick. We used to reproduce the issue every 5-20 runs, and we have not hit it in 200 runs now. Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.