Bug 76050 - [PNV]igt/gem_persistent_relocs/forked-interruptible-faulting-reloc-thrash-inactive randomly causes system hang
Summary: [PNV]igt/gem_persistent_relocs/forked-interruptible-faulting-reloc-thrash-ina...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: high major
Assignee: Rodrigo Vivi
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-12 02:45 UTC by lu hua
Modified: 2017-10-06 14:39 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg(error) (73.28 KB, text/plain)
2014-03-12 02:45 UTC, lu hua
no flags Details
dmesg (9.75 KB, text/plain)
2014-04-14 06:01 UTC, lu hua
no flags Details

Description lu hua 2014-03-12 02:45:22 UTC
Created attachment 95634 [details]
dmesg(error)

System Environment:
--------------------------
Platform: Pieview
kernel:   drm-intel-next-queued/e8e6e6012d68c4967e8f26fdd39ac95c247d4789

Bug detailed description:
---------------------------
It randomly causes system hang on Pinevuew with -nightly or -queued kernel. It happens 1 in 5 runs.
It also randomly has [drm:i915_reset] *ERROR* Failed to reset chip: -19.

output(hang):
IGT-Version: 1.5-g20087e7 (i686) (Linux: 3.14.0-rc6_drm-intel-next-queued_e8e6e6_20140311+ i686)

dmesg(hang):
[   48.431468] console [netcon0] enabled
[   48.431589] netconsole: network logging started
[   48.437292] console [netcon0] disabled
[   48.446037] netpoll: netconsole: local port 6665
[   48.446184] netpoll: netconsole: local IPv4 address 0.0.0.0
[   48.446343] netpoll: netconsole: interface 'enp2s0'
[   48.446485] netpoll: netconsole: remote port 6666
[   48.446622] netpoll: netconsole: remote IPv4 address 10.239.47.171
[   48.446798] netpoll: netconsole: remote ethernet address 74:d0:2b:95:69:65
[   48.446985] netpoll: netconsole: local IP 10.239.47.176
[   48.448644] console [netcon0] enabled
[   48.448767] netconsole: network logging started
[   55.945098] [drm:i915_gem_open],
[   55.945248] [drm:intel_crtc_cursor_set], cursor off
[   55.945354] [drm:intel_crtc_set_config], [CRTC:3] [NOFB]
[   55.945474] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:3], mode_changed=0, fb_changed=0
[   55.945680] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:4]
[   55.945841] [drm:intel_crtc_cursor_set], cursor off
[   55.945945] [drm:intel_crtc_set_config], [CRTC:4] [FB:14] #connectors=1 (x y) (0 0)
[   55.946151] [drm:intel_set_config_compute_mode_changes], computed changes for [CRTC:4], mode_changed=0, fb_changed=0
[   55.949340] [drm:intel_modeset_stage_output_state], [CONNECTOR:5:LVDS-1] to [CRTC:4]
[   55.952422] [drm:i915_gem_open],
[   63.707022] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle
[  133.707023] [drm] no progress on render ring
[  133.716255] [drm] GPU HANG: ecode -1:0x00000000, reason: Ring hung, action: reset
[  133.720785] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  133.725398] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  133.730065] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  133.734848] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  133.739694] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  157.002008] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 1, t=60002 jiffies, g=4000, c=3999, q=54)
[  157.002008] INFO: Stall ended before state dump start

output(error):
IGT-Version: 1.5-g20087e7 (i686) (Linux: 3.14.0-rc6_drm-intel-next-queued_e8e6e6_20140311+ i686)
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 22, Invalid argument
Failed assertion: ret == 0
Test assertion failure function gem_execbuf, file drmtest.c:581:
Test assertion failure function gem_execbuf, file drmtest.c:581:
Last errno: 5, Input/output error
Failed assertion: ret == 0
Last errno: 5, Input/output error
Failed assertion: ret == 0
child 0 failed with exit status 99
Subtest forked-interruptible-faulting-reloc-thrash-inactive: FAIL
gem_reloc_vs_gpu: drmtest.c:1296: children_exit_handler: Assertion `ret == 0' failed.
Aborted (core dumped)


Reproduce steps:
-------------------------
1. ./gem_reloc_vs_gpu --run-subtest forked-interruptible-faulting-reloc-thrash-inactive
Comment 1 Chris Wilson 2014-03-12 07:57:40 UTC
A real gpu hang without the error state attached. Can we have it please?
Comment 2 lu hua 2014-03-28 05:12:15 UTC
System hangs fast, I can't get the error.
Comment 3 Daniel Vetter 2014-04-11 16:43:10 UTC
Can you please retest with latest -nightly? The refcount fix from Chris might help ...
Comment 4 lu hua 2014-04-14 06:01:28 UTC
Created attachment 97328 [details]
dmesg

It still causes system hang on latest -nightly kernel.
Comment 5 lu hua 2014-04-14 06:02:00 UTC
output:
IGT-Version: 1.6-g99b8f80 (i686) (Linux: 3.14.0_drm-intel-nightly_cf8c74_20140414+ i686)
Comment 6 Chris Wilson 2014-05-08 12:26:02 UTC
Have you run your pnv box through memtest recently?
Comment 7 lu hua 2014-05-12 05:46:45 UTC
Run 10 cycles on latest -nightly kernel, It works well. Close it.
Comment 8 lu hua 2014-05-12 05:46:57 UTC
Verified.Fixed.
Comment 9 Elizabeth 2017-10-06 14:39:26 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.