76050 – [PNV]igt/gem_persistent_relocs/forked-interruptible-faulting-reloc-thrash-inactive randomly causes system hang

Bug 76050 - [PNV]igt/gem_persistent_relocs/forked-interruptible-faulting-reloc-thrash-inactive randomly causes system hang

Summary: [PNV]igt/gem_persistent_relocs/forked-interruptible-faulting-reloc-thrash-ina...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	All Linux (All)

Importance:	high major
Assignee:	Rodrigo Vivi
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-03-12 02:45 UTC by lu hua
Modified:	2017-10-06 14:39 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg(error) (73.28 KB, text/plain) 2014-03-12 02:45 UTC, lu hua	no flags	Details
dmesg (9.75 KB, text/plain) 2014-04-14 06:01 UTC, lu hua	no flags	Details
View All

Description lu hua 2014-03-12 02:45:22 UTC

Created attachment 95634 [details]
dmesg(error)

System Environment:
--------------------------
Platform: Pieview
kernel:   drm-intel-next-queued/e8e6e6012d68c4967e8f26fdd39ac95c247d4789

Bug detailed description:
---------------------------
It randomly causes system hang on Pinevuew with -nightly or -queued kernel. It happens 1 in 5 runs.
It also randomly has [drm:i915_reset] *ERROR* Failed to reset chip: -19.

output(hang):
IGT-Version: 1.5-g20087e7 (i686) (Linux: 3.14.0-rc6_drm-intel-next-queued_e8e6e6_20140311+ i686)

dmesg(hang):
[   48.431468] console [netcon0] enabled
[   48.431589] netconsole: network logging started
[   48.437292] console [netcon0] disabled
[   48.446037] netpoll: netconsole: local port 6665
[   48.446184] netpoll: netconsole: local IPv4 address 0.0.0.0
[   48.446343] netpoll: netconsole: interface 'enp2s0'
[   48.446485] netpoll: netconsole: remote port 6666
[   48.446622] netpoll: netconsole: remote IPv4 address 10.239.47.171
[   48.446798] netpoll: netconsole: remote ethernet address 74:d0:2b:95:69:65
[   48.446985] netpoll: netconsole: local IP 10.239.47.176
[   48.448644] console [netcon0] enabled
[   48.448767] netconsole: network logging started
[   55.945098] [drm:i915_gem_open],
[   55.945248] [drm:intel_crtc_cursor_set], cursor off
[   55.945354] [drm:intel_crtc_set_config], [CRTC:3] [NOFB]
[   55.945474] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:3], mode_changed=0, fb_changed=0
[   55.945680] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:4]
[   55.945841] [drm:intel_crtc_cursor_set], cursor off
[   55.945945] [drm:intel_crtc_set_config], [CRTC:4] [FB:14] #connectors=1 (x y) (0 0)
[   55.946151] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:4], mode_changed=0, fb_changed=0
[   55.949340] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:4]
[   55.952422] [drm:i915_gem_open],
[   63.707022] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle
[  133.707023] [drm] no progress on render ring
[  133.716255] [drm] GPU HANG: ecode -1:0x00000000, reason: Ring hung, action: reset
[  133.720785] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  133.725398] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  133.730065] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  133.734848] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  133.739694] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  157.002008] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 1, t=60002 jiffies, g=4000, c=3999, q=54)
[  157.002008] INFO: Stall ended before state dump start

output(error):
IGT-Version: 1.5-g20087e7 (i686) (Linux: 3.14.0-rc6_drm-intel-next-queued_e8e6e6_20140311+ i686)
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 22, Invalid argument
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Last errno: 5, Input/output error
Failed assertion: ret == 0
child 0 failed with exit status 99
Subtest forked-interruptible-faulting-reloc-thrash-inactive: FAIL
gem_reloc_vs_gpu: drmtest.c:1296: children_exit_handler: Assertion `ret == 0' failed.
Aborted (core dumped)


Reproduce steps:
-------------------------
1. ./gem_reloc_vs_gpu --run-subtest forked-interruptible-faulting-reloc-thrash-inactive

Comment 1 Chris Wilson 2014-03-12 07:57:40 UTC

A real gpu hang without the error state attached. Can we have it please?

Comment 2 lu hua 2014-03-28 05:12:15 UTC

System hangs fast, I can't get the error.

Comment 3 Daniel Vetter 2014-04-11 16:43:10 UTC

Can you please retest with latest -nightly? The refcount fix from Chris might help ...

Comment 4 lu hua 2014-04-14 06:01:28 UTC

Created attachment 97328 [details]
dmesg

It still causes system hang on latest -nightly kernel.

Comment 5 lu hua 2014-04-14 06:02:00 UTC

output:
IGT-Version: 1.6-g99b8f80 (i686) (Linux: 3.14.0_drm-intel-nightly_cf8c74_20140414+ i686)

Comment 6 Chris Wilson 2014-05-08 12:26:02 UTC

Have you run your pnv box through memtest recently?

Comment 7 lu hua 2014-05-12 05:46:45 UTC

Run 10 cycles on latest -nightly kernel, It works well. Close it.

Comment 8 lu hua 2014-05-12 05:46:57 UTC

Verified.Fixed.

Comment 9 Elizabeth 2017-10-06 14:39:26 UTC

Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.