Bug 89037 - [SKL]Piglit spec_EXT_texture_array_copyteximage_1D_ARRAY_samples=2 sporadically causes GPU hang
Summary: [SKL]Piglit spec_EXT_texture_array_copyteximage_1D_ARRAY_samples=2 sporadical...
Status: VERIFIED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: All Linux (All)
: high critical
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-09 09:20 UTC by lu hua
Modified: 2015-03-09 06:08 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (19.25 KB, text/plain)
2015-02-09 09:20 UTC, lu hua
Details
dmesg(345e8cc84) (124.88 KB, text/plain)
2015-02-13 01:53 UTC, lu hua
Details
output(GradARB_Cube) (16.55 KB, text/plain)
2015-02-15 05:20 UTC, lu hua
Details
i915_error_state(GradARB_Cube)(zip) (247.15 KB, application/octet-stream)
2015-02-15 05:23 UTC, lu hua
Details

Description lu hua 2015-02-09 09:20:45 UTC
Created attachment 113275 [details]
dmesg

System Environment:
--------------------------
Platform: SKL
Libdrm:         (master)libdrm-2.4.59-8-gccbb9aa887f992359335ecf2d26919b04e14e63f
Mesa:           (master)8030e269e911c4f90a44d9a77eb342dd2657d229
Xserver:         (master)xorg-server-1.17.0
Xf86_video_intel:  (master)2.99.917-94-g26ba2ba6e7169a099558092f31eaf97944cc3562
Libva:          (master)f9741725839ea144e9a6a1827f74503ee39946c3
Libva_intel_driver:     (master)9a20d6c34cb65e5b85dd16d6c8b3a215c5972b18
kernel: drm-intel-nightly/b4442ee4e150506cebeee72249efc566c5f14bbe(0209)

Bug detailed description:
---------------------------
It causes GPU hang on SKL with mesa master branch, then leave it about 5 minutes, system hang.
We run it on upstream tree at the first cycle.

output:
Texture target = GL_TEXTURE_1D_ARRAY, Internal format = GL_INTENSITY
Texture target = GL_TEXTURE_1D_ARRAY, Internal format = GL_DEPTH_COMPONENT
Probe color at (288,0)
  Expected: 0.750000 0.750000 0.750000 1.000000
  Observed: 0.000000 0.000000 0.000000 1.000000
Probe color at (288,0)
  Expected: 0.750000 0.750000 0.750000 1.000000
  Observed: 0.400000 0.400000 0.400000 1.000000
Probe color at (288,0)
  Expected: 0.750000 0.750000 0.750000 1.000000
  Observed: 0.400000 0.400000 0.400000 1.000000
Probe color at (288,0)
  Expected: 0.750000 0.750000 0.750000 1.000000
  Observed: 0.400000 0.400000 0.400000 1.000000
Probe color at (288,0)
  Expected: 0.600000 0.600000 0.600000 1.000000
  Observed: 0.145098 0.145098 0.145098 1.000000
Probe color at (288,0)
  Expected: 0.600000 0.600000 0.600000 1.000000
  Observed: 0.145098 0.145098 0.145098 1.000000
Probe color at (288,0)
  Expected: 0.600000 0.600000 0.600000 1.000000
  Observed: 0.145098 0.145098 0.145098 1.000000
Probe color at (288,0)
  Expected: 0.600000 0.600000 0.600000 1.000000
  Observed: 0.145098 0.145098 0.145098 1.000000
Probe color at (288,0)
  Expected: 0.450000 0.450000 0.450000 1.000000
  Observed: 0.780392 0.780392 0.780392 1.000000
Probe color at (288,0)
  Expected: 0.450000 0.450000 0.450000 1.000000
  Observed: 0.780392 0.780392 0.780392 1.000000
Probe color at (288,0)
  Expected: 0.450000 0.450000 0.450000 1.000000
  Observed: 0.780392 0.780392 0.780392 1.000000
Probe color at (288,0)
  Expected: 0.450000 0.450000 0.450000 1.000000
  Observed: 0.780392 0.780392 0.780392 1.000000
Probe color at (288,0)
  Expected: 0.300000 0.300000 0.300000 1.000000
  Observed: 0.019608 0.019608 0.019608 1.000000
Probe color at (288,0)
  Expected: 0.300000 0.300000 0.300000 1.000000
  Observed: 0.019608 0.019608 0.019608 1.000000
Probe color at (288,0)
  Expected: 0.300000 0.300000 0.300000 1.000000
  Observed: 0.019608 0.019608 0.019608 1.000000
Probe color at (288,0)
  Expected: 0.300000 0.300000 0.300000 1.000000
  Observed: 0.019608 0.019608 0.019608 1.000000
Texture target = GL_TEXTURE_1D_ARRAY, Internal format = GL_DEPTH_COMPONENT16
Probe color at (304,0)
  Expected: 0.750000 0.750000 0.750000 1.000000
  Observed: 0.000000 0.000000 0.000000 1.000000
intel_do_flush_locked failed: Input/output error


call trace:
[  466.103461] Call Trace:
[  466.103914]  [<ffffffff81400806>] ? scsi_host_alloc_command+0x3d/0x9e
[  466.104374]  [<ffffffff8140095b>] ? scsi_get_command+0x16/0x128
[  466.104830]  [<ffffffff814086d1>] ? scsi_prep_fn+0x58/0x139
[  466.105284]  [<ffffffff81326452>] ? blk_peek_request+0xf7/0x216
[  466.105742]  [<ffffffff81408aa4>] ? scsi_request_fn+0x2f/0x5cc
[  466.106201]  [<ffffffff81323aa4>] ? __blk_run_queue+0x29/0x31
[  466.106659]  [<ffffffff81326b23>] ? blk_queue_bio+0x27d/0x2be
[  466.107117]  [<ffffffff8132493f>] ? generic_make_request+0x93/0xd0
[  466.107578]  [<ffffffff81324a7b>] ? submit_bio+0xff/0x11d
[  466.108040]  [<ffffffff8113aad9>] ? _submit_bh+0x104/0x122
[  466.108500]  [<ffffffff8113af0d>] ? ll_rw_block+0x6d/0x77
[  466.108957]  [<ffffffff8113bb02>] ? __breadahead+0x2c/0x40
[  466.109409]  [<ffffffff811b85ed>] ? __ext4_get_inode_loc+0x287/0x396
[  466.109860]  [<ffffffff811db35d>] ? ext4_ext_tree_init+0x2b/0x30
[  466.110308]  [<ffffffff811bc159>] ? ext4_reserve_inode_write+0x1c/0x7f
[  466.110758]  [<ffffffff811bc21d>] ? ext4_mark_inode_dirty+0x61/0x1cb
[  466.111211]  [<ffffffff811db35d>] ? ext4_ext_tree_init+0x2b/0x30
[  466.111662]  [<ffffffff811b7999>] ? __ext4_new_inode+0xf0d/0x110d
[  466.112115]  [<ffffffff811c2cc8>] ? ext4_lookup+0xf6/0x12d
[  466.112567]  [<ffffffff8111bbd7>] ? __lookup_hash+0x2a/0x31
[  466.113018]  [<ffffffff8111f47d>] ? kern_path_create+0x70/0x112
[  466.113471]  [<ffffffff811c49e5>] ? ext4_mkdir+0xdb/0x35c
[  466.113926]  [<ffffffff8110e981>] ? kmem_cache_alloc+0x27/0x113
[  466.114382]  [<ffffffff8111c8e7>] ? vfs_mkdir+0xac/0x128
[  466.114838]  [<ffffffff8112098b>] ? SyS_mkdirat+0x6b/0xa9
[  466.115295]  [<ffffffff817a0092>] ? system_call_fastpath+0x12/0x17
[  466.115753] Code: e9 4d 89 f8 4c 89 f1 48 89 ea 48 89 de 41 ff 14 24 49 83 c4 10 49 83 3c 24 00 eb 36 eb 38 49 63 44 24 20 4d 8b 04 24 48 8d 4a 01 <48> 8b 5c 05 00 48 89 e8 65 49 0f c7 08 0f 94 c0 84 c0 0f 85 78
[  466.116276] RIP  [<ffffffff8110ea33>] kmem_cache_alloc+0xd9/0x113
[  466.116772]  RSP <ffff88043b85b9c8>
[  466.121844] ---[ end trace b809efb3ce78ceb0 ]---

==Reproduce steps==
---------------------------- 
1. xinit
2. bin/copyteximage 1D_ARRAY -samples=2 -auto
3. leave it 5 minutes
Comment 1 Damien Lespiau 2015-02-09 15:12:05 UTC
This should be something amiss in mesa.
Comment 2 Neil Roberts 2015-02-09 17:59:09 UTC
This seems to be fixed if we implement WaDisable1DDepthStencil with the patch posted here:

http://lists.freedesktop.org/archives/mesa-dev/2015-February/076398.html
Comment 3 Gordon Jin 2015-02-10 01:21:33 UTC
With kernel in internal tree, this case fails but not hang.
Comment 4 fangxun 2015-02-10 10:22:26 UTC
(In reply to Neil Roberts from comment #2)
> This seems to be fixed if we implement WaDisable1DDepthStencil with the
> patch posted here:
> 
> http://lists.freedesktop.org/archives/mesa-dev/2015-February/076398.html

It doesn't hang with the patch.
Comment 5 Neil Roberts 2015-02-10 18:07:38 UTC
I've pushed the patch to Mesa master

http://cgit.freedesktop.org/mesa/mesa/commit/?id=5b29b2922afe2b8167a589fc2896
Comment 6 lu hua 2015-02-13 01:52:41 UTC
Test on Mesa 345e8cc8496b4e6c56105c7396e80d85a37e122c, some case still has GPU hang but not system hang.
It sporadically causes GPU hang. Run 50 cycles.


[  877.815971] [drm:i915_hangcheck_elapsed [i915]] *ERROR* Hangcheck timer elapsed... blitter ring idle
Comment 7 lu hua 2015-02-13 01:53:11 UTC
Created attachment 113431 [details]
dmesg(345e8cc84)
Comment 8 lu hua 2015-02-13 01:54:44 UTC
I find many piglit cases cause GPU hang.

./bin/glean -o -v -v -v -t +pixelFormats --quick
output:
----------------------------------------------------------------------
Test that all the various pixel formats/types (like
GL_BGRA/GL_UNSIGNED_SHORT_4_4_4_4_REV) operate correctly.
Test both glTexImage and glDrawPixels.
For textures, also test all the various internal texture formats.
Thousands of combinations are possible!

pixelFormats:  PASS rgba8, db, z24, s8, win+pmap, id 32
        20124 tests passed, 0 tests failed.


<3>[  398.754921] [drm:i915_hangcheck_elapsed [i915]] *ERROR* Hangcheck timer elapsed... blitter ring idle

# cat /sys/kernel/debug/dri/0/i915_error_state
no error state collected
Comment 9 lu hua 2015-02-13 06:09:11 UTC
This error is similar with bug 87138, I am not sure they are same issue.
Comment 10 lu hua 2015-02-15 05:20:50 UTC
Created attachment 113501 [details]
output(GradARB_Cube)

spec_ARB_shader_texture_lod_execution_tex-miplevel-selection_*GradARB_Cube still has hang issue. It causes GPU hang, and leave it 5 minutes, system hang.
Comment 11 lu hua 2015-02-15 05:23:20 UTC
Created attachment 113502 [details]
i915_error_state(GradARB_Cube)(zip)
Comment 12 Ben Widawsky 2015-02-16 20:04:39 UTC
As you stated, the error state you have attached doesn't correspond to the bug title. You've posted error state for tex-miplevel-selection.

I think this bug has been fixed by Neil already. Please open a new bug for the failures you are seeing, and re-open this if exactly this test hangs.
Comment 13 lu hua 2015-02-27 07:50:30 UTC
Run bin/copyteximage 1D_ARRAY -samples=2 -auto 15 cycles on the latest Mesa master branch(commit 1a93e7690dc902), it still causes GPU hang, but not system hang and no error state collected.
Reopen it.

dmesg:
<3>[  563.789721] [drm:i915_hangcheck_elapsed [i915]] *ERROR* Hangcheck timer elapsed... blitter ring idle
Comment 14 Ben Widawsky 2015-03-05 17:54:07 UTC
Please refile as a kernel bug if this happens again.
Comment 15 lu hua 2015-03-09 06:08:19 UTC
Tracked in bug 89493


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.