Bug 102657 - [BAT][BYT only] igt@* Incomplete - timeout/system hang
Summary: [BAT][BYT only] igt@* Incomplete - timeout/system hang
Status: NEEDINFO
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL: https://bugzilla.kernel.org/show_bug....
Whiteboard: ReadyForDev
Keywords:
: 102547 102619 103411 (view as bug list)
Depends on:
Blocks: 105984
  Show dependency treegraph
 
Reported: 2017-09-11 10:33 UTC by Martin Peres
Modified: 2018-11-23 11:17 UTC (History)
3 users (show)

See Also:
i915 platform: BYT
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2017-09-11 10:33:38 UTC
On CI_DRM_3066, the machine fi-byt-j1900 hard hanged while running igt@gem_exec_store@basic-default.

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3066/fi-byt-j1900/igt@gem_exec_store@basic-default.html
Comment 1 Chris Wilson 2017-09-11 10:38:49 UTC
Byt does have a known hard hang somewhere between the punit and cstates. Maybe we should use intel_idle.max_cstate=1 for the farm for stability reasons?
Comment 2 Marta Löfstedt 2017-10-17 06:39:24 UTC
This also looks like suspicious system hang:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3250/fi-byt-j1900/igt@gem_cpu_reloc@basic.html

Stray after last dmesg:
<7>[   65.159738] [IGT] gem_close_race: executing

run.log has:

[022/289] skip: 9, pass: 13 -   
FATAL: command execution failed
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
Comment 3 Marta Löfstedt 2017-11-01 08:13:13 UTC
*** Bug 103411 has been marked as a duplicate of this bug. ***
Comment 4 Marta Löfstedt 2017-11-01 08:15:20 UTC
*** Bug 102619 has been marked as a duplicate of this bug. ***
Comment 5 Marta Löfstedt 2017-11-01 08:16:04 UTC
*** Bug 102547 has been marked as a duplicate of this bug. ***
Comment 6 Marta Löfstedt 2017-11-21 06:40:02 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3994/fi-byt-j1900/igt@chamelium@dp-edid-read.html

This looks weird.

This is all in dmesg:
<5>[   47.814867] owatch: Using watchdog device /dev/watchdog0
<5>[   47.815313] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   47.818658] owatch: timeout for /dev/watchdog0 set to 100 (requested 100)
<6>[   62.849318] Console: switching to colour dummy device 80x25
<7>[   62.849812] [IGT] chamelium: executing
<7>[   65.263503] [IGT] chamelium: exiting, ret=77
<7>[   65.474299] [IGT] chamelium: executing

run.log:
[000/289]  |                      
skip: igt/chamelium/dp-hpd-fast

[001/289] skip: 1 |
running: igt/chamelium/dp-edid-read

[001/289] skip: 1 /                
FATAL: command execution failed
java.io.EOFException
Comment 7 Marta Löfstedt 2017-12-05 13:29:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3458/fi-byt-j1900/igt@gem_exec_reloc@basic-gtt-cpu.html

last dmesg:
<4>[  216.579095] Setting dangerous option reset - tainting kernel
<7>[  216.580828] [IGT] gem_exec_reloc: exiting, ret=0
<7>[  216.765572] [IGT] gem_exec_reloc: executing

run.log:
running: igt/gem_exec_reloc/basic-gtt-cpu

[073/288] skip: 9, pass: 64 /            
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3458/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 548 seconds
Rebooting fi-byt-j1900
Comment 8 Marta Löfstedt 2017-12-07 07:20:48 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3460/fi-byt-j1900/igt@gem_exec_reloc@basic-gtt-read.html

last dmesg:
<4>[  223.768697] Setting dangerous option reset - tainting kernel
<7>[  223.770112] [IGT] gem_exec_reloc: exiting, ret=0
<7>[  223.965041] [IGT] gem_exec_reloc: executing

run.log:
running: igt/gem_exec_reloc/basic-gtt-read

[075/288] skip: 9, pass: 66 \             
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3460/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 553 seconds
Rebooting fi-byt-j1900
Comment 9 Marta Löfstedt 2017-12-13 06:32:44 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3502/fi-byt-j1900/igt@gem_exec_gttfill@basic.html

last dmesg:
<7>[  187.956708] [IGT] gem_exec_flush: starting subtest basic-wb-set-default
<7>[  193.355585] [IGT] gem_exec_flush: exiting, ret=0
<7>[  193.547687] [IGT] gem_exec_gttfill: executing

run.log:
running: igt/gem_exec_gttfill/basic

[064/288] skip: 9, pass: 55 |      
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3502/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 555 seconds
Rebooting fi-byt-j1900
Comment 10 Marta Löfstedt 2018-01-02 07:15:11 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3554/fi-byt-j1900/igt@gem_exec_reloc@basic-gtt-read-noreloc.html

run.log:
running: igt/gem_exec_reloc/basic-gtt-read-noreloc

[084/288] skip: 9, pass: 75 |                     
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3554/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 548 seconds
Rebooting fi-byt-j1900

dmesg:
<7>[  223.809430] [IGT] gem_exec_reloc: starting subtest basic-cpu-read-noreloc
<4>[  223.816317] Setting dangerous option reset - tainting kernel
<7>[  223.817618] [IGT] gem_exec_reloc: exiting, ret=0
<7>[  223.991672] [IGT] gem_exec_reloc: executing
Comment 11 Marta Löfstedt 2018-01-04 07:27:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3595/fi-byt-j1900/igt@gem_exec_reloc@basic-write-cpu.html

dmesg:
<7>[  246.092851] [IGT] gem_exec_reloc: starting subtest basic-gtt-read
<4>[  246.099140] Setting dangerous option reset - tainting kernel
<7>[  246.100254] [IGT] gem_exec_reloc: exiting, ret=0
Folloed by stray.

run.log:
running: igt/gem_exec_reloc/basic-write-cpu

[076/288] skip: 9, pass: 67 |              
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3595/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 551 seconds
Rebooting fi-byt-j1900
Comment 12 Jani Saarinen 2018-01-09 07:06:22 UTC
Reference (trybot): https://patchwork.freedesktop.org/series/36157/
Comment 13 Marta Löfstedt 2018-01-25 12:42:41 UTC
This is a Meta bug for incompletes on BYT.
Comment 14 Marta Löfstedt 2018-02-02 07:21:19 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3714/fi-byt-j1900/igt@gem_exec_reloc@basic-cpu-gtt-active.html

last dmesg:
<4>[  240.653543] Setting dangerous option reset - tainting kernel
<7>[  240.654440] [IGT] gem_exec_reloc: exiting, ret=0
<7>[  240.844582] [IGT] gem_exec_reloc: executing
Comment 16 Marta Löfstedt 2018-03-29 08:26:29 UTC
Last seen:  IGT_4302: 2018-02-26 / 249 runs ago
Comment 18 Marta Löfstedt 2018-04-04 06:53:39 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_10/fi-byt-j1900/igt@drm_import_export@flink.html

run.log:
running: igt/drm_import_export/flink

[26/98] skip: 13, pass: 13 -        
FATAL: command execution failed
...
Completed CI_IGT_test drmtip_10/fi-byt-j1900/25 : FAILURE
CI_IGT_test runtime 456 seconds
Rebooting fi-byt-j1900

dmesg:
<7>[  222.720447] [IGT] drm_import_export: executing
<7>[  222.738358] [IGT] drm_import_export: starting subtest flink
<5>[  223.568659] random: crng init done
Comment 19 Francesco Balestrieri 2018-06-12 12:51:51 UTC
Last seen 2 days, 10 hours ago according to Cibuglogger
Comment 20 Francesco Balestrieri 2018-08-14 07:41:39 UTC
Has anybody tried Chris' suggestion above: "Maybe we should use intel_idle.max_cstate=1 for the farm for stability reasons?"
Comment 21 Francesco Balestrieri 2018-10-10 07:06:54 UTC
Changing to NEEDINFO while waiting for the answer to the above question.
Comment 22 Martin Peres 2018-10-10 10:27:30 UTC
(In reply to Francesco Balestrieri from comment #20)
> Has anybody tried Chris' suggestion above: "Maybe we should use
> intel_idle.max_cstate=1 for the farm for stability reasons?"

Tomi, it seems this never was applied to the grub command line (https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4919/fi-byt-clapper/boot0.log).

Would it be possible to add it?
Comment 23 Francesco Balestrieri 2018-11-08 09:41:20 UTC
Ping?
Comment 24 Tomi Sarvela 2018-11-08 12:09:37 UTC
Added option intel_idle.max_cstate=1 for fi-byt-clapper, added host to https://intel-gfx-ci.01.org/hardware.html
Comment 25 Martin Peres 2018-11-08 13:01:10 UTC
(In reply to Tomi Sarvela from comment #24)
> Added option intel_idle.max_cstate=1 for fi-byt-clapper, added host to
> https://intel-gfx-ci.01.org/hardware.html

Thanks!
Comment 26 Francesco Balestrieri 2018-11-23 11:17:04 UTC
Moving to medium since it's a meta-bug of sorts.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.