The machine fi-hsw-4770r hard-hanged when running igt@gem_exec_flush@basic-batch-kernel-default-cmd on CI_DRM_2671. The only thing that looks a little suspicious in the logs is this: [ 104.040011] [IGT] gem_exec_fence: starting subtest await-hang-default [ 106.726854] [drm:missed_breadcrumb [i915]] vecs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x5c/0x80 [i915], irq posted? no, current seqno=4f3d, last=4f3e [ 112.762486] [drm] GPU HANG: ecode 7:0:0xe757feff, in gem_exec_fence [1670], reason: Hang on rcs0, action: reset [ 112.762778] [drm:i915_reset_and_wakeup [i915]] resetting chip [ 112.762851] drm/i915: Resetting chip after gpu hang [ 112.763002] [drm:i915_gem_reset [i915]] context gem_exec_fence[1670]/0 marked guilty (score 10) banned? no [ 112.763035] [drm:i915_gem_reset [i915]] resetting rcs0 to restart from tail of request 0x5966c [ 112.763205] [drm:intel_print_rc6_info [i915]] Enabling RC6 states: RC6 on [ 112.763379] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 0 [ 112.774296] [IGT] gem_exec_fence: exiting, ret=0 Full logs: https://intel-gfx-ci.01.org/CI/CI_DRM_2671/fi-hsw-4770r/igt@gem_exec_flush@basic-batch-kernel-default-cmd.html
(In reply to Martin Peres from comment #0) > The machine fi-hsw-4770r hard-hanged when running > igt@gem_exec_flush@basic-batch-kernel-default-cmd on CI_DRM_2671. The only > thing that looks a little suspicious in the logs is this: > > > [ 104.040011] [IGT] gem_exec_fence: starting subtest await-hang-default > [ 106.726854] [drm:missed_breadcrumb [i915]] vecs0 missed breadcrumb at > intel_breadcrumbs_hangcheck+0x5c/0x80 [i915], irq posted? no, current > seqno=4f3d, last=4f3e > [ 112.762486] [drm] GPU HANG: ecode 7:0:0xe757feff, in gem_exec_fence > [1670], reason: Hang on rcs0, action: reset > [ 112.762778] [drm:i915_reset_and_wakeup [i915]] resetting chip > [ 112.762851] drm/i915: Resetting chip after gpu hang > [ 112.763002] [drm:i915_gem_reset [i915]] context gem_exec_fence[1670]/0 > marked guilty (score 10) banned? no > [ 112.763035] [drm:i915_gem_reset [i915]] resetting rcs0 to restart from > tail of request 0x5966c > [ 112.763205] [drm:intel_print_rc6_info [i915]] Enabling RC6 states: RC6 on > [ 112.763379] [drm:init_workarounds_ring [i915]] rcs0: Number of context > specific w/a: 0 > [ 112.774296] [IGT] gem_exec_fence: exiting, ret=0 That's not suspicious as it was a hang test. The test in question (gem_exec_flush) is not recorded as even starting. There's no information here regarding the disappearance of the machine.
(In reply to Chris Wilson from comment #1) > (In reply to Martin Peres from comment #0) > > The machine fi-hsw-4770r hard-hanged when running > > igt@gem_exec_flush@basic-batch-kernel-default-cmd on CI_DRM_2671. The only > > thing that looks a little suspicious in the logs is this: > > > > > > [ 104.040011] [IGT] gem_exec_fence: starting subtest await-hang-default > > [ 106.726854] [drm:missed_breadcrumb [i915]] vecs0 missed breadcrumb at > > intel_breadcrumbs_hangcheck+0x5c/0x80 [i915], irq posted? no, current > > seqno=4f3d, last=4f3e > > [ 112.762486] [drm] GPU HANG: ecode 7:0:0xe757feff, in gem_exec_fence > > [1670], reason: Hang on rcs0, action: reset > > [ 112.762778] [drm:i915_reset_and_wakeup [i915]] resetting chip > > [ 112.762851] drm/i915: Resetting chip after gpu hang > > [ 112.763002] [drm:i915_gem_reset [i915]] context gem_exec_fence[1670]/0 > > marked guilty (score 10) banned? no > > [ 112.763035] [drm:i915_gem_reset [i915]] resetting rcs0 to restart from > > tail of request 0x5966c > > [ 112.763205] [drm:intel_print_rc6_info [i915]] Enabling RC6 states: RC6 on > > [ 112.763379] [drm:init_workarounds_ring [i915]] rcs0: Number of context > > specific w/a: 0 > > [ 112.774296] [IGT] gem_exec_fence: exiting, ret=0 > > That's not suspicious as it was a hang test. OK, I'll try to remember that this one also is one of these tests. > The test in question > (gem_exec_flush) is not recorded as even starting. There's no information > here regarding the disappearance of the machine. Piglit probably only syncs after the execution of a test, not at the beginning of a test.
does this mean that this is not a bug? Martin or is a bug but for the IGT test?
(In reply to Ricardo from comment #3) > does this mean that this is not a bug? Martin or is a bug but for the IGT > test? No, it is likely a bug in the kernel... but we don't have enough data yet. Hopefully, we can get more logs next time it happens.
ok thanks... will leave it open and NEEDINFO
Statistics: Failure rate 1/25 run(s) (4%)
Seen once 2017-05-30. Statistics: Failure rate 1/67 run(s) (1%)
Still only one failure at 2017-05-30. Statistics: Failure rate 1/100 run(s) (1%) - seems to be going towards resolved+worksforme...
Those hangs are the worst, because we have no idea about what trigger them... I understand your view Jari, but I think we should keep this bug until the end of the month and then close it. I can say that this has not been seen in the patchwork runs nor the IGT runs.
(In reply to Martin Peres from comment #9) > Those hangs are the worst, because we have no idea about what trigger them... > > I understand your view Jari, but I think we should keep this bug until the > end of the month and then close it. I can say that this has not been seen in > the patchwork runs nor the IGT runs. Hello, I'm proceeding to close this bug since the time mentioned before has passed. If the problem appear again, please share the information and change status to REOPEN. Thank you.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.