Bug 108767 - [CI][Runner] Abort on network down
Summary: [CI][Runner] Abort on network down
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: IGT (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Petri Latvala
QA Contact:
URL:
Whiteboard: ReadyForDev
Keywords:
: 108680 108953 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-11-16 14:38 UTC by Martin Peres
Modified: 2018-12-07 16:19 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Peres 2018-11-16 14:38:02 UTC
When the network is down, the external watchdogs are bound to come and kill the machine.

Protect ourselves against these random incompletes by stopping the execution early and exiting.

The current plan was to allow the configuration of a host to ping before executing a new binary to make sure that the network is still up. The host to ping should be specified in igtrc. If none are specified, no pings should be made.
Comment 2 Jani Saarinen 2018-11-21 11:43:35 UTC
Dropping priority, new feature request.
Comment 3 Martin Peres 2018-12-03 16:26:11 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5244/fi-kbl-7560u/igt@gem_basic@create-close.html

No logs, but it seems like it got stuck very early on, as can be seen in the run log:

[001/332] (960s left) core_auth (basic-auth)
FATAL: command execution failed
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: java.io.IOException: Backing channel 'fi-kbl-7560u' is disconnected.
	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:214)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)
	at com.sun.proxy.$Proxy64.isAlive(Unknown Source)
	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1144)
	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1136)
	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
	at hudson.model.Build$BuildExecution.build(Build.java:206)
	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
	at hudson.model.Run.execute(Run.java:1810)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:429)
FATAL: Unable to delete script file /tmp/jenkins3295738120970564664.sh
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:140)
	at hudson.remoting.Command.readFrom(Command.java:126)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on fi-kbl-7560u failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:948)
	at hudson.FilePath.act(FilePath.java:1070)
	at hudson.FilePath.act(FilePath.java:1059)
	at hudson.FilePath.delete(FilePath.java:1563)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:123)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
	at hudson.model.Build$BuildExecution.build(Build.java:206)
	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
	at hudson.model.Run.execute(Run.java:1810)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:429)
Build step 'Execute shell' marked build as failure
Set build name.
Workspace is not accessible
ERROR: no workspace for CI_IGT_test #2662333
Notifying upstream projects of job completion
Finished: FAILURE
Completed CI_IGT_test CI_DRM_5244/fi-kbl-7560u/0 : FAILURE
CI_IGT_test runtime 6 seconds
Rebooting fi-kbl-7560u
Comment 4 Lakshmi 2018-12-07 16:16:09 UTC
*** Bug 108953 has been marked as a duplicate of this bug. ***
Comment 5 Lakshmi 2018-12-07 16:19:58 UTC
*** Bug 108680 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.