System Environment: -------------------------- Platform: SKL Libdrm: (master)libdrm-2.4.60-31-g6f90b77ea903756c87ae614c093e3d816ebb26fc Mesa: (master)50e9fa2ed69cb5f76f66231976ea789c0091a64d Xserver:(master)xorg-server-1.17.0-72-gf1da6bf5d94911e78d2e27e6accf0c6e3aefb331 Xf86_video_intel:(master)2.99.917-256-gfbefc8f2bd4242c3f01b02e25276340237b34a88 Libva: (master)062a63932c0f1439aa587aa986bbcfb758ff38f2 Libva_intel_driver:(master)ed03aebc6e702dab65204cc1469eef0da73e2372 Kernel: (drm-intel-nightly)044307a99b418258ac0d775460d73b20b80277c1 Bug detailed description: ----------------------------- It sporadically causes system hang. Run full piglit case multiple rounds, It happens on different case. Run attached piglit case list, execute the result_list and system hang. then run list 2, it doesn't cause system hang. Reproduce steps: ---------------------------- 1. xinit 2. run attached piglit list.
Created attachment 115046 [details] piglit case list
Created attachment 115047 [details] result list
Created attachment 115048 [details] piglit list 2
I am not sure it regression or not. Run full piglit case, it also has GPU hang or system bug 89493 and bug 89037. These 3 bugs are random.
It also happens on BSW.
Created attachment 115907 [details] dmesg Test the latest mesa master branch and the latest drm-intel-nightly kernel on BSW, Run full piglit, it causes GPU hang then system, attached the dmesg. [ 378.564175] [drm] GPU HANG: ecode 8:0:0x85dffdfb, in ext_framebuffer [7399], reason: Ring hung, action: reset [ 378.564269] [drm:i915_reset_and_wakeup] resetting chip [ 378.565766] drm/i915: Resetting chip after gpu hang
Created attachment 115908 [details] dmesg(without kernel.printk="7417") comment 6's dmesg with "sysctl -w kernel.printk="7417"". Retest without "sysctl -w kernel.printk="7417"", call trace is clear. [ 761.617901] BUG: unable to handle kernel paging request at 00007f9ee1236008 [ 761.704932] IP: [<ffffffff817ade61>] error_entry+0x1/0x5b [ 761.773069] PGD 175b80067 PUD 0 [ 761.815214] Thread overran stack, or stack corrupted [ 761.878022] Oops: 0002 [#1] SMP [ 761.920190] Modules linked in: ipv6 dm_mod snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support serio_raw pcspkr i2c_i801 lpc_ich mfd_core snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore battery ac acpi_cpufreq i915 button[ 762.231906] gmain[4824]: segfault at a940 ip 000000000000a940 sp 00007fea30e81d80 error 14 in accounts-daemon[400000+26000] [ 762.376039] video drm_kms_helper drm [ 762.430270] CPU: 1 PID: 5360 Comm: python Not tainted 4.1.0-rc3_drm-intel-nightly_056608_20150519+ #410 [ 762.547037] task: ffff880178216240 ti: ffff88006ae40000 task.ti: ffff88006ae40000 [ 762.640897] RIP: 0010:[<ffffffff817ade61>] [<ffffffff817ade61>] error_entry+0x1/0x5b [ 762.739098] RSP: 0000:ffff88006ae43ef0 EFLAGS: 00010092 [ 762.806946] RAX: 0000000000000004 RBX: fa9af535ea216240 RCX: 0000000000000000 [ 762.896755] RDX: 00007f09f5cfc0c0 RSI: 00007f09f791a3b0 RDI: 00007f09fa4ee5e9 [ 762.986599] RBP: ec5ca7b044216240 R08: ffff880175474a80 R09: 00007f09e8022000 [ 763.076428] R10: ffff880178216620 R11: 9b96372c6f000000 R12: c490f629e7b55d88 [ 763.166257] R13: 9f4a419c87103000 R14: 286a50d5df040d87 R15: df02be047c216240 [ 763.256080] FS: 00007f09f5cfd700(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000 [ 763.357378] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 763.430585] CR2: 00007f9ee1236008 CR3: 00000001754bf000 CR4: 00000000001006e0 [ 763.520480] Stack: [ 763.548896] ffffffff817add1c 0000000000000000 ffff880178216240 ffffffff8103f557 [ 763.642527] ffff88017a411128 ffff88017a411128 0000000000000000 0000000001128fb0 [ 763.736242] 0000000001425ac0 00007f09f7cf3a70 f4e4bcc0d163fdd0 0d124f3036000000 [ 763.830004] Call Trace: [ 763.863917] [<ffffffff817add1c>] ? page_fault+0xc/0x30 [ 763.931234] [<ffffffff8103f557>] ? task_stopped_code+0x3a/0x3a [ 764.006943] [<ffffffff817ade61>] ? error_entry+0x1/0x5b [ 764.075405] [<ffffffff817add1c>] ? page_fault+0xc/0x30 [ 764.142824] Code: 4c 8b 44 24 48 48 [ 764.181178] PANIC: double fault, error_code: 0x0 [ 764.181186] CPU: 2 PID: 5373 Comm: ext_framebuffer Not tainted 4.1.0-rc3_drm-intel-nightly_056608_20150519+ #410 [ 764.181192] task: ffff880178866240 ti: ffff88006ae54000 task.ti: ffff88006ae54000 [ 764.181195] RIP: 0010:[<ffffffff817add17>] [<ffffffff817add17>] page_fault+0x7/0x30 [ 764.181208] RSP: 0000:ffff8800201fffd8 EFLAGS: 00010096 [ 764.181209] RAX: 00000000817ace77 RBX: 0000000000000001 RCX: ffffffff817ace77 [ 764.181212] RDX: 000000000000a940 RSI: 0000000000000000 RDI: ffff880020200098 [ 764.181213] RBP: 0000000000000009 R08: 0000000000000000 R09: 0000000000000001 [ 764.181215] R10: 0000000000000034 R11: 0000000002104d60 R12: 00007f9ee8da2e10 [ 764.181218] R13: 00007f9ee8da0038 R14: 000000000000008e R15: 0000000000000000 [ 764.181220] FS: 00007f9ee8dc7780(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000 [ 764.181222] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 764.181224] CR2: ffff8800201fffc8 CR3: 000000006e080000 CR4: 00000000001006e0 [ 764.181226] Stack: [ 765.382838] 8b 44 24 50 48 8b 4c 24 58 48 8b 54 24 60 48 8b 74 24 68 48 8b 7c 24 70 48 81 c4 80 00 00 00 e9 10 f0 ff ff fc <4c> 89 5c 24 38 4c 89 54 24 40 4c 89 4c 24 48 4c 89 44 24 50 48 [ 765.603067] RIP [<ffffffff817ade61>] error_entry+0x1/0x5b [ 765.674383] RSP <ffff88006ae43ef0> [ 765.721595] CR2: 00007f9ee1236008 [ 765.766899] BUG: unable to handle kernel paging request at 0000000000010092 [ 765.856096] IP: [<ffffffff81127340>] __d_lookup_rcu+0x65/0x123 [ 765.931681] PGD 1754bd067 PUD 179d7e067 PMD 0 [ 765.990816] Oops: 0000 [#2] SMP [ 766.035223] Modules linked in: ipv6 dm_mod snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support serio_raw pcspkr i2c_i801 lpc_ich mfd_core snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore battery ac acpi_cpufreq i915 button video drm_kms_helper drm [ 766.390273] CPU: 1 PID: 5380 Comm: kworker/u8:1 Not tainted 4.1.0-rc3_drm-intel-nightly_056608_20150519+ #410 [ 766.515404] task: ffff8801782149b0 ti: ffff88006aec8000 task.ti: ffff88006aec8000 [ 766.611410] RIP: 0010:[<ffffffff81127340>] [<ffffffff81127340>] __d_lookup_rcu+0x65/0x123 [ 766.716978] RSP: 0018:ffff88006aecbb88 EFLAGS: 00010206 [ 766.786973] RAX: 0000000000000003 RBX: 0000000000010096 RCX: 000000000000000d [ 766.878871] RDX: 0000000000000000 RSI: ffff88006aecbd88 RDI: ffff880076938300 [ 766.970767] RBP: 000000000001008e R08: 8080808080808080 R09: fefefefefefefeff [ 767.062698] R10: 2f2f2f2f2f2f2f2f R11: ffff88006aecbc04 R12: 000000037797fe36 [ 767.154668] R13: ffff880002cc701d R14: ffff88006aecbd88 R15: ffff880076938300 [ 767.246650] FS: 0000000000000000(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000 [ 767.350164] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 767.425536] CR2: 0000000000010092 CR3: 00000001754bf000 CR4: 00000000001006e0
(In reply to lu hua from comment #6) > Created attachment 115907 [details] > dmesg > > Test the latest mesa master branch and the latest drm-intel-nightly kernel > on BSW, Run full piglit, it causes GPU hang then system, attached the dmesg. > It causes GPU hang then system hang.
Created attachment 115912 [details] gpu hang error state Not sure if this helps but here's a error state from the gpu hang. It is easy to reproduce, just run tests/quick.tests and in some point it will hang. Also later on (as described in comments) the whole system hangs.
Tapani, do you know if it's always the same test which is failing for you? The error state you posted may be fixed by 5ae6c7bfce5c9fb91ab6cef2ea74a39af091d5f6 in master. Just a hunch (EU 0 in the GS went out to lunch).
(In reply to Ben Widawsky from comment #10) > Tapani, do you know if it's always the same test which is failing for you? > The error state you posted may be fixed by > 5ae6c7bfce5c9fb91ab6cef2ea74a39af091d5f6 in master. Just a hunch (EU 0 in > the GS went out to lunch). OK thanks, I've pulled drm-nightly and current Mesa at 065978d and will try to reproduce. As additional info for the GPU hangs, I've found that following tests cause hangs reliably: arb_gpu_shader5/execution/sampler_array_indexing/gs-weird-uniforms.shader_test arb_gpu_shader5/execution/sampler_array_indexing/fs-weird-uniforms.shader_test still not sure what makes the whole machine hang, will keep digging.
Created attachment 116141 [details] error state Test on the latest mesa master branch,commit 10aacf5ae8f3e90e2f0967fbdcf96df93e346e20. Run full piglit case on 1910(rev 02), it has gpu hang but not system hang, attached the error state. Run full piglit case on 190c (rev 03), I meet twice system hang. Run full piglit case on BSW twice, I don't meet system but see GPU hang.
To add more into comment #11 it seems many of the dynamic sampler array indexing for fs and gs tests cause hang, for example: arb_gpu_shader5/execution/sampler_array_indexing/fs-simple.shader_test arb_gpu_shader5/execution/sampler_array_indexing/fs-nonzero-base.shader_test however vs ones seem to pass, maybe this helps.
some other gpu hang reproducers (likely few different issues here): bin/tex-miplevel-selection *GradARB Cube -auto -fbo bin/tex-miplevel-selection textureGrad CubeArray -auto -fbo
(In reply to lu hua from comment #12) > Created attachment 116141 [details] > error state > > Test on the latest mesa master branch,commit > 10aacf5ae8f3e90e2f0967fbdcf96df93e346e20. > Run full piglit case on 1910(rev 02), it has gpu hang but not system hang, > attached the error state. > Run full piglit case on 190c (rev 03), I meet twice system hang. > Run full piglit case on BSW twice, I don't meet system but see GPU hang. Based on this test result, remove BSW platform from this bug title.
(In reply to Tapani Pälli from comment #13 and comment #14) I'm able to reproduce these GPU hangs on SKL. No system hang.
(In reply to Anuj Phogat from comment #16) > I'm able to reproduce these GPU hangs on SKL. No system hang. sampler_array_indexing tests don't hang with latest mesa master. Failures are fixed by Neil's patch on mailing list: http://patchwork.freedesktop.org/patch/50710/
Neils patch does not help for SKL-Y. I still get full system hang after 2 GPU hangs, when running the quick.py piglit set. So, the patch only seem to solve the problem for SKL-S.
Marta, can you please add the error state.
Could you please also test with this patch? http://patchwork.freedesktop.org/patch/50676/ Without that patch the GS tests for sampler array indexing are failing. On my SKL-Y machine once one of those GS tests fails some of the other sampler array indexing tests seem to start failing too. I wonder if it puts the hardware in some broken state.
(In reply to Neil Roberts from comment #20) > Could you please also test with this patch? > > http://patchwork.freedesktop.org/patch/50676/ > > Without that patch the GS tests for sampler array indexing are failing. On > my SKL-Y machine once one of those GS tests fails some of the other sampler > array indexing tests seem to start failing too. I wonder if it puts the > hardware in some broken state. Apply this patch, GPU hang still exists.
I was under the impression that this issue is resolved, at least from the Mesa side. Is that not so, or is this issue not updated?
Can someone from QA please confirm it exists on master from today?
Test on the latest mesa master branch, It still exists. run: bin/tex-miplevel-selection *GradARB Cube -auto -fbo dmesg: [13484.007161] [drm:i915_gem_open] [13484.033555] [drm:i915_gem_context_create_ioctl] HW context 1 created [13489.597268] [drm] stuck on render ring [13489.597800] [drm] GPU HANG: ecode 9:0:0x85dffffb, in tex-miplevel-se [7157], reason: Ring hung, action: reset [13489.597827] [drm:i915_reset_and_wakeup] resetting chip [13489.600043] drm/i915: Resetting chip after gpu hang [13489.600076] [drm:gen8_init_common_ring] Execlists enabled for render ring [13489.600079] [drm:gen8_init_common_ring] Execlists enabled for bsd ring [13489.600081] [drm:gen8_init_common_ring] Execlists enabled for blitter ring [13489.600083] [drm:gen8_init_common_ring] Execlists enabled for video enhancement ring [13491.597002] [drm] RC6 on [13495.596368] [drm] stuck on render ring [13495.596581] [drm] GPU HANG: ecode 9:0:0x85dffffb, in tex-miplevel-se [7157], reason: Ring hung, action: reset [13495.596605] [drm:i915_reset_and_wakeup] resetting chip [13495.598694] drm/i915: Resetting chip after gpu hang [13495.598728] [drm:gen8_init_common_ring] Execlists enabled for render ring [13495.598730] [drm:gen8_init_common_ring] Execlists enabled for bsd ring [13495.598733] [drm:gen8_init_common_ring] Execlists enabled for blitter ring [13495.598735] [drm:gen8_init_common_ring] Execlists enabled for video enhancement ring [13495.601226] [drm:i915_gem_context_destroy_ioctl] HW context 1 destroyed [13497.596361] [drm] RC6 on
Please open a new bug for that failure. If the sporadic hangs are gone, please close this bug. Thanks.
Run full piglit case on 190c (rev 03), it still has system hang.
Okay. the miplevel selection test you mention (https://bugs.freedesktop.org/show_bug.cgi?id=90008#c24) has had issues on many platforms. I'd prefer to ignore that, or again, file a new bug for the SKL hang only.
Tracking the system hang separately. (https://bugs.freedesktop.org/show_bug.cgi?id=90854) Re-titling this
Based on this command: ./piglit-run.py -1 -x glean -x glx -x fbo-depth-array gpu /tmp Run 4 cycles on x-skly05, it always has GPU hang, twice have system hang. Attached dmesg and i915_error_state
Created attachment 116568 [details] dmesg_piglit_skly05_gpuhang_notsystemhang_0618
Created attachment 116569 [details] dmesg_skly05_piglit_gpuhang_systemhang_0618
Created attachment 116570 [details] i915_error_state_skly05_piglit_gpuhang_notsystemhang
Can you please test master as of today. I pushed a patch which is fixing some other hangs.
Run full piglit 3 cycles on mesa commit 6844d6b7f8398a25eff511541b187afeb1199ce0, it doesn't have gpu hang or system hang. Close it.
Created attachment 116920 [details] dmesg system hang on fbo-depth-array The system still hang during the test ext_texture_array@fbo-depth-array. Setup: ------- Hardware Platform: SKY LAKE Y A0 CPU : Intel(R) Core(TM) m3-6Y30 CPU @ 0.8GHz 4MB (family: 6, model: 78 stepping: 3) MCP : SKL-Y D1 2+2 (ou ULX-D1) QDF : QYV3 CPU : SKL D0 Chipset PCH: Sunrise Point LP C1 CRB : SKY LAKE Y LPDDR3 RVP3 CRB FAB2 Reworks : All Mandatories + FBS02 & FBS03, O-06 Software Linux : Ubuntu 14.04 LTS 64 bits BIOS : SKLSE2R1.R00.X085.B02.150601337 ME FW : 11.0.0.1149 Ksc (EC FW): 1.15 Kernel 4.1-0 (drm-intel-nightly-2015-06-27) Mesa: mesa-10.5.8 (master) 24b043aab73ce066ded6e4bc93f589008dfc8484 Xf86_video_intel: 2.99.917 (master) baec802b21387d04aebb10ac29e719a1800c5aa0 Libdrm: libdrm-2.4.61 (master) 203983f842a889b279698fdea46e83ee4450a1db libva: libva-1.6.0.pre1 (master) 0f88a645ab3cea69d63371189e53cd465ab95a20 intel-driver: 1.6.0.pre1 (master) f3f74ea23601750078215fad04dde6748364b88d xorg: 1.17.99 Xserver: xorg-server-1.17.2 (master) 2123f7682d522619f101b05fb75efa75dabbe371 Piglit: (master) 107318d835dbbf51af55c62abb2aee154822a4c7
Welcome Olivier. First, that is not a sporadic failure, and second there is already a bug for that test specifically: https://bugs.freedesktop.org/show_bug.cgi?id=91062
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.