Bug 111598 - [CI][SHARDS] igt@gem_sync@basic-all - fail - Failed assertion: !"GPU hung"
Summary: [CI][SHARDS] igt@gem_sync@basic-all - fail - Failed assertion: !"GPU hung"
Status: NEEDINFO
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: low not set
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-09 08:16 UTC by Martin Peres
Modified: 2019-10-31 18:36 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2019-09-09 08:16:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3416/shard-snb4/igt@gem_sync@basic-all.html

Starting subtest: basic-all
(gem_sync:4753) igt_aux-CRITICAL: Test assertion failure function sig_abort, file ../lib/igt_aux.c:502:
(gem_sync:4753) igt_aux-CRITICAL: Failed assertion: !"GPU hung"
Comment 1 CI Bug Log 2019-09-09 08:18:02 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SNB: igt@gem_sync@basic-all - fail - Failed assertion: !"GPU hung"
  - https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_3416/shard-snb4/igt@gem_sync@basic-all.html
Comment 2 Chris Wilson 2019-09-09 08:25:25 UTC
<7> [2140.194405] hangcheck rcs0
<7> [2140.194409] hangcheck 	Awake? 1
<7> [2140.194411] hangcheck 	Hangcheck: 12032 ms ago
<7> [2140.194413] hangcheck 	Reset count: 0 (global 49)
<7> [2140.194414] hangcheck 	Requests:
<7> [2140.194417] hangcheck 	MMIO base:  0x00002000
<7> [2140.194420] hangcheck 	CCID: 0x7fff610d
<7> [2140.194423] hangcheck 	RING_START: 0x00001000
<7> [2140.194425] hangcheck 	RING_HEAD:  0x00001350
<7> [2140.194428] hangcheck 	RING_TAIL:  0x00000638
<7> [2140.194432] hangcheck 	RING_CTL:   0x00003001
<7> [2140.194435] hangcheck 	RING_MODE:  0x00004040
<7> [2140.194438] hangcheck 	RING_IMR: fffffffe
<7> [2140.194440] hangcheck 	ACTHD:  0x00000000_fa101a00
<7> [2140.194443] hangcheck 	BBADDR: 0x00000000_fa104879
<7> [2140.194446] hangcheck 	DMA_FADDR: 0x00000000_fa107400
<7> [2140.194448] hangcheck 	IPEIR: 0x00000000
<7> [2140.194451] hangcheck 	IPEHR: 0x0042001e
<7> [2140.194455] hangcheck 		E  3:1e3294-  prio=2147483647 @ 13752ms: [i915]
<7> [2140.194457] hangcheck HWSP:
<7> [2140.194460] hangcheck [0000] 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [2140.194463] hangcheck [0020] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [2140.194464] hangcheck *
<7> [2140.194466] hangcheck [0100] 001e3292 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [2140.194469] hangcheck [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<7> [2140.194470] hangcheck *
<7> [2140.194475] hangcheck Idle? no
<7> [2140.194477] hangcheck Signals:
<7> [2140.194479] hangcheck 	[3:1e3294] @ 13752ms

IPEHR doesn't correspond to anything natural. Looks like another instance where we have strange behaviour on the ringbuffer overshooting its TAIL?
Comment 3 Chris Wilson 2019-09-10 14:33:44 UTC
Stuck in

commit 0efa99dd58754d23e884b9ba41cd601f01b58c3d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Sep 9 12:30:18 2019 +0100

    drm/i915/ringbuffer: Flush writes before RING_TAIL update

to see if that makes any difference. Come back in 6-12 months time to find out!
Comment 4 ashutosh.dixit 2019-10-31 18:36:08 UTC
Bug Assessment: No records of any failures in CI Bug Log. CI Bug Log also showing 0% reproduction rate. The issue is for SNB only but SNB runs are showing clean here:

https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_sync@basic-all.html

Is is possibly a one off? I am reducing the priority to low, can it be closed?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.