Bug 102657

Summary: [BAT][BYT only] igt@* Incomplete - timeout/system hang
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, marta.lofstedt, tomi.p.sarvela
Version: XOrg git   
Hardware: Other   
OS: All   
URL: https://bugzilla.kernel.org/show_bug.cgi?id=109051
Whiteboard: ReadyForDev
i915 platform: BYT i915 features: GEM/Other
Bug Depends on:    
Bug Blocks: 105984    

Description Martin Peres 2017-09-11 10:33:38 UTC
On CI_DRM_3066, the machine fi-byt-j1900 hard hanged while running igt@gem_exec_store@basic-default.

Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3066/fi-byt-j1900/igt@gem_exec_store@basic-default.html
Comment 1 Chris Wilson 2017-09-11 10:38:49 UTC
Byt does have a known hard hang somewhere between the punit and cstates. Maybe we should use intel_idle.max_cstate=1 for the farm for stability reasons?
Comment 2 Marta Löfstedt 2017-10-17 06:39:24 UTC
This also looks like suspicious system hang:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3250/fi-byt-j1900/igt@gem_cpu_reloc@basic.html

Stray after last dmesg:
<7>[   65.159738] [IGT] gem_close_race: executing

run.log has:

[022/289] skip: 9, pass: 13 -   
FATAL: command execution failed
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)
Comment 3 Marta Löfstedt 2017-11-01 08:13:13 UTC
*** Bug 103411 has been marked as a duplicate of this bug. ***
Comment 4 Marta Löfstedt 2017-11-01 08:15:20 UTC
*** Bug 102619 has been marked as a duplicate of this bug. ***
Comment 5 Marta Löfstedt 2017-11-01 08:16:04 UTC
*** Bug 102547 has been marked as a duplicate of this bug. ***
Comment 6 Marta Löfstedt 2017-11-21 06:40:02 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3994/fi-byt-j1900/igt@chamelium@dp-edid-read.html

This looks weird.

This is all in dmesg:
<5>[   47.814867] owatch: Using watchdog device /dev/watchdog0
<5>[   47.815313] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   47.818658] owatch: timeout for /dev/watchdog0 set to 100 (requested 100)
<6>[   62.849318] Console: switching to colour dummy device 80x25
<7>[   62.849812] [IGT] chamelium: executing
<7>[   65.263503] [IGT] chamelium: exiting, ret=77
<7>[   65.474299] [IGT] chamelium: executing

run.log:
[000/289]  |                      
skip: igt/chamelium/dp-hpd-fast

[001/289] skip: 1 |
running: igt/chamelium/dp-edid-read

[001/289] skip: 1 /                
FATAL: command execution failed
java.io.EOFException
Comment 7 Marta Löfstedt 2017-12-05 13:29:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3458/fi-byt-j1900/igt@gem_exec_reloc@basic-gtt-cpu.html

last dmesg:
<4>[  216.579095] Setting dangerous option reset - tainting kernel
<7>[  216.580828] [IGT] gem_exec_reloc: exiting, ret=0
<7>[  216.765572] [IGT] gem_exec_reloc: executing

run.log:
running: igt/gem_exec_reloc/basic-gtt-cpu

[073/288] skip: 9, pass: 64 /            
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3458/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 548 seconds
Rebooting fi-byt-j1900
Comment 8 Marta Löfstedt 2017-12-07 07:20:48 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3460/fi-byt-j1900/igt@gem_exec_reloc@basic-gtt-read.html

last dmesg:
<4>[  223.768697] Setting dangerous option reset - tainting kernel
<7>[  223.770112] [IGT] gem_exec_reloc: exiting, ret=0
<7>[  223.965041] [IGT] gem_exec_reloc: executing

run.log:
running: igt/gem_exec_reloc/basic-gtt-read

[075/288] skip: 9, pass: 66 \             
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3460/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 553 seconds
Rebooting fi-byt-j1900
Comment 9 Marta Löfstedt 2017-12-13 06:32:44 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3502/fi-byt-j1900/igt@gem_exec_gttfill@basic.html

last dmesg:
<7>[  187.956708] [IGT] gem_exec_flush: starting subtest basic-wb-set-default
<7>[  193.355585] [IGT] gem_exec_flush: exiting, ret=0
<7>[  193.547687] [IGT] gem_exec_gttfill: executing

run.log:
running: igt/gem_exec_gttfill/basic

[064/288] skip: 9, pass: 55 |      
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3502/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 555 seconds
Rebooting fi-byt-j1900
Comment 10 Marta Löfstedt 2018-01-02 07:15:11 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3554/fi-byt-j1900/igt@gem_exec_reloc@basic-gtt-read-noreloc.html

run.log:
running: igt/gem_exec_reloc/basic-gtt-read-noreloc

[084/288] skip: 9, pass: 75 |                     
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3554/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 548 seconds
Rebooting fi-byt-j1900

dmesg:
<7>[  223.809430] [IGT] gem_exec_reloc: starting subtest basic-cpu-read-noreloc
<4>[  223.816317] Setting dangerous option reset - tainting kernel
<7>[  223.817618] [IGT] gem_exec_reloc: exiting, ret=0
<7>[  223.991672] [IGT] gem_exec_reloc: executing
Comment 11 Marta Löfstedt 2018-01-04 07:27:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3595/fi-byt-j1900/igt@gem_exec_reloc@basic-write-cpu.html

dmesg:
<7>[  246.092851] [IGT] gem_exec_reloc: starting subtest basic-gtt-read
<4>[  246.099140] Setting dangerous option reset - tainting kernel
<7>[  246.100254] [IGT] gem_exec_reloc: exiting, ret=0
Folloed by stray.

run.log:
running: igt/gem_exec_reloc/basic-write-cpu

[076/288] skip: 9, pass: 67 |              
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3595/fi-byt-j1900/0 : FAILURE
CI_IGT_test runtime 551 seconds
Rebooting fi-byt-j1900
Comment 12 Jani Saarinen 2018-01-09 07:06:22 UTC
Reference (trybot): https://patchwork.freedesktop.org/series/36157/
Comment 13 Marta Löfstedt 2018-01-25 12:42:41 UTC
This is a Meta bug for incompletes on BYT.
Comment 14 Marta Löfstedt 2018-02-02 07:21:19 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3714/fi-byt-j1900/igt@gem_exec_reloc@basic-cpu-gtt-active.html

last dmesg:
<4>[  240.653543] Setting dangerous option reset - tainting kernel
<7>[  240.654440] [IGT] gem_exec_reloc: exiting, ret=0
<7>[  240.844582] [IGT] gem_exec_reloc: executing
Comment 16 Marta Löfstedt 2018-03-29 08:26:29 UTC
Last seen:  IGT_4302: 2018-02-26 / 249 runs ago
Comment 18 Marta Löfstedt 2018-04-04 06:53:39 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_10/fi-byt-j1900/igt@drm_import_export@flink.html

run.log:
running: igt/drm_import_export/flink

[26/98] skip: 13, pass: 13 -        
FATAL: command execution failed
...
Completed CI_IGT_test drmtip_10/fi-byt-j1900/25 : FAILURE
CI_IGT_test runtime 456 seconds
Rebooting fi-byt-j1900

dmesg:
<7>[  222.720447] [IGT] drm_import_export: executing
<7>[  222.738358] [IGT] drm_import_export: starting subtest flink
<5>[  223.568659] random: crng init done
Comment 19 Francesco Balestrieri 2018-06-12 12:51:51 UTC
Last seen 2 days, 10 hours ago according to Cibuglogger
Comment 20 Francesco Balestrieri 2018-08-14 07:41:39 UTC
Has anybody tried Chris' suggestion above: "Maybe we should use intel_idle.max_cstate=1 for the farm for stability reasons?"
Comment 21 Francesco Balestrieri 2018-10-10 07:06:54 UTC
Changing to NEEDINFO while waiting for the answer to the above question.
Comment 22 Martin Peres 2018-10-10 10:27:30 UTC
(In reply to Francesco Balestrieri from comment #20)
> Has anybody tried Chris' suggestion above: "Maybe we should use
> intel_idle.max_cstate=1 for the farm for stability reasons?"

Tomi, it seems this never was applied to the grub command line (https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4919/fi-byt-clapper/boot0.log).

Would it be possible to add it?
Comment 23 Francesco Balestrieri 2018-11-08 09:41:20 UTC
Ping?
Comment 24 Tomi Sarvela 2018-11-08 12:09:37 UTC
Added option intel_idle.max_cstate=1 for fi-byt-clapper, added host to https://intel-gfx-ci.01.org/hardware.html
Comment 25 Martin Peres 2018-11-08 13:01:10 UTC
(In reply to Tomi Sarvela from comment #24)
> Added option intel_idle.max_cstate=1 for fi-byt-clapper, added host to
> https://intel-gfx-ci.01.org/hardware.html

Thanks!
Comment 26 Francesco Balestrieri 2018-11-23 11:17:04 UTC
Moving to medium since it's a meta-bug of sorts.
Comment 27 Martin Peres 2019-11-29 17:26:05 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/45.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.