System Environment: Platform: SKL Kernel: (drm-intel-nightly)b4442ee4e150506cebeee72249efc566c5f14bbe Libdrm: (master)libdrm-2.4.59-8-gccbb9aa887f992359335ecf2d26919b04e14e63f Mesa: (master)345e8cc8496b4e6c56105c7396e80d85a37e122c Xserver: (master)xorg-server-1.17.0 Xf86_video_intel: (master)2.99.917-100-g5b033d638bbf2c0b841088ca75f9eb8de5852cb5 Cairo: (master)70cc8f250b5669e757b4f044571ba0f71e3dea9e Libva: (master)f9741725839ea144e9a6a1827f74503ee39946c3 Libva_intel_driver: (master)9a20d6c34cb65e5b85dd16d6c8b3a215c5972b18 Bug detailed description: -------------------------------------------------- System will hang when running etqw-demo on SKL. Add i915.enbale_ppgtt=0, it still fails. System often unable to load USB driver after reboot. If enable the console command(console=tty0 coonsole=ttyS0,9600), the system can not boot up, So I can not fetch the dmesg info. Error information: [ 704.804125] [drm] stuck on render ring [ 704.806299] [drm] GPU HANG: ecode 9:0:0x85dffffb, in etqw.x86 [7074], reason: Ring hung, action: reset [ 704.806302] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 704.806304] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 704.806306] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 704.806308] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 704.806310] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 704.808042] drm/i915: Resetting chip after gpu hang [ 706.804390] [drm] RC6 on [ 708.806788] [drm] GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue [ 712.807317] [drm] GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue [ 716.807673] [drm] GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue [ 720.808310] [drm] GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue [ 724.808821] [drm] GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue [ 728.809356] [drm] GPU HANG: ecode 9:-1:0x00000000, reason: Kicking stuck wait on render ring, action: continue [ 729.542908] stack segment: 0000 [#1] SMP [ 729.542946] Modules linked in: dm_mod ppdev snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel pcspkr snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd i2c_i801 soundcore wmi battery parport_pc parport ac acpi_cpufreq i915 button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea [ 729.543213] CPU: 3 PID: 7148 Comm: bash Tainted: G W 3.19.0-rc7_drm-intel-nightly_b4442e_20150208+ #198 [ 729.543283] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.86C.B069.R00.1501192136 01/19/2015 [ 729.543433] task: ffff880047e7b800 ti: ffff880047f64000 task.ti: ffff880047f64000 [ 729.543484] RIP: 0010:[<ffffffff8110dd86>] [<ffffffff8110dd86>] kmem_cache_alloc_trace+0xce/0x104 [ 729.543550] RSP: 0018:ffff880047f67d28 EFLAGS: 00210282 [ 729.543588] RAX: 0000000000000000 RBX: ffff880142d83b00 RCX: 000000000000a530 [ 729.543636] RDX: 000000000000a52f RSI: 00000000000000d0 RDI: ffff880149c03900 [ 729.543683] RBP: ff88003a89db7f00 R08: 0000000000015520 R09: ffff880047f67e50 [ 729.543729] R10: ffff880047f67e50 R11: 0000000000000000 R12: 00000000000000d0 [ 729.543777] R13: 0000000000000080 R14: ffffffff81151bba R15: ffff880149c03900 [ 729.543826] FS: 00007f48e8ff4740(0000) GS:ffff88014e4c0000(0000) knlGS:0000000000000000 [ 729.543881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 729.543920] CR2: 00007fff69e90860 CR3: 000000003a9a5000 CR4: 00000000003407e0 [ 729.543968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 729.544036] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 729.544082] Stack: [ 729.544097] 00000000fffffff8 ffff880142d83b00 00000000fffffff8 ffffffff81bd42f0 [ 729.544155] ffff880144ca8001 ffff880047e7b800 0000000000000000 ffffffff81151bba [ 729.544211] 0000000000000000 ffff880142d83b00 ffff880047f67fd8 ffff880047e7b800 [ 729.544267] Call Trace: [ 729.544290] [<ffffffff81151bba>] ? load_elf_binary+0x41/0x15af [ 729.544332] [<ffffffff8114f9ca>] ? load_misc_binary+0x4a/0x2cf [ 729.544380] [<ffffffff811187f5>] ? copy_strings.isra.24+0x247/0x29a [ 729.544427] [<ffffffff811188ef>] ? search_binary_handler+0x71/0x17a [ 729.547541] [<ffffffff811199c2>] ? do_execveat_common.isra.29+0x46d/0x641 [ 729.550636] [<ffffffff81119bb9>] ? do_execve+0x23/0x28 [ 729.553732] [<ffffffff81119dbd>] ? SyS_execve+0x23/0x2a [ 729.556791] [<ffffffff817a05e9>] ? stub_execve+0x69/0xa0 [ 729.559774] Code: 7e 08 45 89 e1 49 89 d8 4c 89 e9 48 89 ea 4c 89 fe 41 ff 16 49 83 c6 10 49 83 3e 00 eb 30 eb 32 49 63 47 20 4d 8b 07 48 8d 4a 01 <48> 8b 5c 05 00 48 89 e8 65 49 0f c7 08 0f 94 c0 84 c0 75 89 e9 [ 729.562907] RIP [<ffffffff8110dd86>] kmem_cache_alloc_trace+0xce/0x104 [ 729.566028] RSP <ffff880047f67d28> [ 729.614868] ---[ end trace 773f3dc1692c6328 ]--- [ 730.200992] BUG: Bad rss-counter state mm:ffff880077b6c980 idx:1 val:2 [ 730.696968] stack segment: 0000 [#2] SMP [ 730.701236] Modules linked in: dm_mod ppdev snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel pcspkr snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd i2c_i801 soundcore wmi battery parport_pc parport ac acpi_cpufreq i915 button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea [ 730.705933] CPU: 3 PID: 2979 Comm: jbd2/sda3-8 Tainted: G D W 3.19.0-rc7_drm-intel-nightly_b4442e_20150208+ #198 [ 730.710590] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.86C.B069.R00.1501192136 01/19/2015 [ 730.715138] task: ffff88014306a000 ti: ffff88009a6fc000 task.ti: ffff88009a6fc000 [ 730.719461] RIP: 0010:[<ffffffff8110ea33>] [<ffffffff8110ea33>] kmem_cache_alloc+0xd9/0x113 [ 730.723812] RSP: 0018:ffff88009a6ff998 EFLAGS: 00010082 [ 730.728018] RAX: 0000000000000000 RBX: ffff880144df0f00 RCX: 000000000000a530 [ 730.732194] RDX: 000000000000a52f RSI: 0000000000000020 RDI: ffff880149c03900 [ 730.736343] RBP: ff88003a89db7f00 R08: 0000000000015520 R09: 0000000000000000 [ 730.740411] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880149c03900 [ 730.744328] R13: 0000000000000020 R14: ffffffff81400806 R15: 50fbd75400000000 [ 730.748066] FS: 0000000000000000(0000) GS:ffff88014e4c0000(0000) knlGS:0000000000000000 [ 730.751848] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 730.755656] CR2: 00007fff69e90860 CR3: 0000000003b99000 CR4: 00000000003407e0 [ 730.759530] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 730.763363] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 730.767157] Stack: [ 730.770923] 0000000000000000 ffff880144df0f00 ffffffff81c01120 0000000000000020 [ 730.774792] ffff8801489d2800 ffff8801489d2800 50fbd75400000000 ffffffff81400806 [ 730.778651] ffff88009b910600 ffff88009b910600 ffff8801488b1000 ffff8801489d2800 [ 730.782538] Call Trace: [ 730.786405] [<ffffffff81400806>] ? scsi_host_alloc_command+0x3d/0x9e [ 730.790178] [<ffffffff8140095b>] ? scsi_get_command+0x16/0x128 [ 730.794004] [<ffffffff814086d1>] ? scsi_prep_fn+0x58/0x139 [ 730.797760] [<ffffffff81326452>] ? blk_peek_request+0xf7/0x216 [ 730.801574] [<ffffffff81408aa4>] ? scsi_request_fn+0x2f/0x5cc [ 730.805166] [<ffffffff81323aa4>] ? __blk_run_queue+0x29/0x31 [ 730.808816] [<ffffffff81326b23>] ? blk_queue_bio+0x27d/0x2be [ 730.812566] [<ffffffff8132493f>] ? generic_make_request+0x93/0xd0 [ 730.816295] [<ffffffff81324a7b>] ? submit_bio+0xff/0x11d [ 730.819969] [<ffffffff8113aad9>] ? _submit_bh+0x104/0x122 [ 730.823572] [<ffffffff811fd665>] ? journal_submit_commit_record.isra.9+0x146/0x1a0 [ 730.827224] [<ffffffff811fe69f>] ? jbd2_journal_commit_transaction+0xfe0/0x16ed [ 730.831042] [<ffffffff81060ec3>] ? pick_next_task_fair+0x325/0x3aa [ 730.834806] [<ffffffff8107c175>] ? lock_timer_base.isra.37+0x23/0x47 [ 730.838580] [<ffffffff812035d5>] ? kjournald2+0x10e/0x321 [ 730.842231] [<ffffffff81065284>] ? add_wait_queue+0x3c/0x3c [ 730.845955] [<ffffffff812034c7>] ? jbd2_journal_clear_features+0x73/0x73 [ 730.849599] [<ffffffff81050dee>] ? kthread+0xc5/0xcd [ 730.853237] [<ffffffff81050d29>] ? kthread_freezable_should_stop+0x40/0x40 [ 730.856867] [<ffffffff8179ffec>] ? ret_from_fork+0x7c/0xb0 [ 730.860686] [<ffffffff81050d29>] ? kthread_freezable_should_stop+0x40/0x40 [ 730.864410] Code: e9 4d 89 f8 4c 89 f1 48 89 ea 48 89 de 41 ff 14 24 49 83 c4 10 49 83 3c 24 00 eb 36 eb 38 49 63 44 24 20 4d 8b 04 24 48 8d 4a 01 <48> 8b 5c 05 00 48 89 e8 65 49 0f c7 08 0f 94 c0 84 c0 0f 85 78 [ 730.868345] RIP [<ffffffff8110ea33>] kmem_cache_alloc+0xd9/0x113 [ 730.872267] RSP <ffff88009a6ff998> [ 730.876056] ---[ end trace 773f3dc1692c6329 ]--- Reproduce steps: ---------------------------- 1, xinit& 2, gnome-session& 3, vbank_mode=0 ./etqw.x86 +set sys_VideoRam 64 +set r_mode -1 +set in_tty 0 +exec etqw-pts.cfg +set r_customWidth 1920 +set r_customHeight 1080 +vid_restart
At this point it's more likely that Mesa needs a bit more SKL work than it being a kernel bug.
Well either that or some other problem on an early stepping that didn't go through a very thorough binning process.
Padman demo also causes system hang.
Please attach error state. And please try master (which now has 5b29b2922afe2b8167a589fc2896a071fc85b693)
It also causes system hang with the mesa(5b29b292). I can not get the full error state due to system hang.
(In reply to ye.tian from comment #5) > It also causes system hang with the mesa(5b29b292). > I can not get the full error state due to system hang. I get this full error state by cat file, plesae see the attach file.
Created attachment 113340 [details] error state info
Created attachment 113392 [details] [review] Set a minimum stencil qpitch Please test. I doubt this will do anything. I can't find anything else wrong in the error state (the depth buffer offset looks fishy, 0, but 0 is a valid active BO).
Created attachment 113438 [details] [review] disable RCC camming (kernel patch) Please test this one too.
Also, please test without simd16 dispatch INTEL_DEBUG=no16
(In reply to Ben Widawsky from comment #8) > Created attachment 113392 [details] [review] [review] > Set a minimum stencil qpitch > > Please test. I doubt this will do anything. > > I can't find anything else wrong in the error state (the depth buffer offset > looks fishy, 0, but 0 is a valid active BO). It still exist on this patch.
(In reply to Ben Widawsky from comment #10) > Also, please test without simd16 dispatch > INTEL_DEBUG=no16 This issue also exists.
Did you test the kernel patch? https://bugs.freedesktop.org/attachment.cgi?id=113438
(In reply to Ben Widawsky from comment #13) > Did you test the kernel patch? > https://bugs.freedesktop.org/attachment.cgi?id=113438 Yes, I did test it and with/without INTEL_DEBUG=no16.
Can you please attach the error state with both patches applies, and INTEL_DEBUG=no16?
Created attachment 113455 [details] error state info (not full)
I did not get the full error info.
Ye Tian, can you please attach your padman demo file as well as the command line you're using to invoke it?
Created attachment 113829 [details] ETQW-demo2 logs Ye Tian I downloaded and installed ETQW-demo2-client-full.r1.x86.run from http://www.splashdamage.com/node/222. On executing, user interface loads fine with no hangs but failed to load textures when tried playing the game. Did you use the same version of the demo? Any suggestions to fix the texture loading issue?
Created attachment 113838 [details] ETQW-demo logs Only run ./etqw.x86 is good. I also downloaded and installed ETQW-demo2-client-full.r1.x86.run, but cannot find the demo file, So I copy the demo file from "etqw-demo-1.1.0" to "etqw.demo". run this command "vbank_mode=0 ./etqw.x86 +set sys_VideoRam 64 +set r_mode -1 +set in_tty 0 +exec etqw-pts.cfg +set r_customWidth 1920 +set r_customHeight 1080 +vid_restart", it also causes system hang and render error. You can try it.
Created attachment 113839 [details] etqw config file You can download and put it in etqw.demo/base/.
Created attachment 113840 [details] demos file You can unzip and put this folder in etqw.demo/base/.
This bug is also exist on Mesa10.5rc2 testing
Please test this branch: http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=workarounds
Created attachment 114012 [details] [review] disable PMA stall workaround Please test this backportable patch instead of my branch.
Created attachment 114013 [details] [review] Same as before, but it compiles this time. Please test this on mesa master.
(In reply to Ben Widawsky from comment #26) > Created attachment 114013 [details] [review] [review] > Same as before, but it compiles this time. > > Please test this on mesa master. Test this on mesa master(0dfec59a) with patch, the system still hang.
Created attachment 114081 [details] i915_error_state info As same as the above info.
We cannot reproduce this hang. Can you please test again, and if it still fails, update the BIOS and test again. Thanks.
Tested on mesa master(0dfec59a). Without patch demo hanged consistently at a specific frame. With the patch demo did run fine few times with out a hang. But executing it multiple times cause random hangs at a different frame every time.
(In reply to Ben Widawsky from comment #29) > We cannot reproduce this hang. Can you please test again, and if it still > fails, update the BIOS and test again. > > Thanks. Re-test again, It still fails. I cannot update the BIOS, because I have not received the corresponding CPU. I will test again as soon as I receive the new CPU.
Ye Tian, we are seeing the same issue now. Do not worry about upgrading the BIOS.
As I mentioned in comment 30, Ben's patch changed the behavior of hang. Without patch, demo hanged at a specific frame every time. With patch, it run fine few times before hanging at a random frame. Ye Tian, did you also see similar change of behavior after the patch?
With drm-intel-fixes and this patch, I go more than an hour before I hit a hang. I've seen it go as much as 4 hours. I am not sure that is the same hang as the original bug report. If the behavior is confirmed, I'd like to merge the patch and close this bug.
(In reply to Ben Widawsky from comment #34) > With drm-intel-fixes and this patch, I go more than an hour before I hit a > hang. I've seen it go as much as 4 hours. I am not sure that is the same > hang as the original bug report. > > If the behavior is confirmed, I'd like to merge the patch and close this bug. Tested with drm-intel-fixes(5e4f51) and this patch, it will appear the below info after running for a while and auto interrupt the process of the games. "5988 Segmentation fault (core dumped)" The system is not hang,but GPU hang. please see the dmesg and error_state info.
Created attachment 114266 [details] dmesg info
Created attachment 114267 [details] i915_error_state info
Tested with drm-intel-fixes(5e4f51) and latest Mesa(master)30916a5ef. I saw that demo hanged at a specific frame every time. The rest issue is same as the patch. system is not hang, GPU hang, i915_error_state. "drm/i915: Resetting chip after gpu hang"
Tested on new processor.
(In reply to ye.tian from comment #35) > (In reply to Ben Widawsky from comment #34) > > With drm-intel-fixes and this patch, I go more than an hour before I hit a > > hang. I've seen it go as much as 4 hours. I am not sure that is the same > > hang as the original bug report. > > > > If the behavior is confirmed, I'd like to merge the patch and close this bug. > > > Tested with drm-intel-fixes(5e4f51) and this patch, it will appear the below > info after running for a while and auto interrupt the process of the games. > "5988 Segmentation fault (core dumped)" How long is "a while"? Do you see the same behavior with other games as well if you run them for, "a while"?
(In reply to Ben Widawsky from comment #40) > (In reply to ye.tian from comment #35) > > (In reply to Ben Widawsky from comment #34) > > > With drm-intel-fixes and this patch, I go more than an hour before I hit a > > > hang. I've seen it go as much as 4 hours. I am not sure that is the same > > > hang as the original bug report. > > > > > > If the behavior is confirmed, I'd like to merge the patch and close this bug. > > > > > > Tested with drm-intel-fixes(5e4f51) and this patch, it will appear the below > > info after running for a while and auto interrupt the process of the games. > > "5988 Segmentation fault (core dumped)" > > How long is "a while"? Do you see the same behavior with other games as well > if you run them for, "a while"? Re-tested with drm-intel-fixes(2dccc9)and this patch, I found that GPU hang after running about 3 minutes, (picture attach)but the etqw-demo will running very slowly. Maybe with you see the phenomenon is the same. The padman demo is good. Tested with -nightly(f7def4) and this patch: run result as below time ./etqw-demo.sh ./etqw-demo.sh: line 15: 5886 Segmentation fault (core dumped) vbank_mode=0 ./etqw.x86 +set sys_VideoRam 64 +set r_mode -1 +set in_tty 0 +exec etqw-pts.cfg +set r_customWidth $w +set r_customHeight $h +vid_restart > /tmp/tmp.log 2>&1 real 0m57.088s user 0m41.197s sys 0m3.875s The padman demo is good.
Created attachment 114345 [details] etqw-demo picture after GPU hang
Ye Tian, can you please create a new bug for Padman, and we'll change this bug to etqw? That way, we can upstream the other fix, and deal with this separately.
Recent patch (http://patchwork.freedesktop.org/patch/44605) by Neil Roberts fixes the misrendering in the demo.
(In reply to Ben Widawsky from comment #43) > Ye Tian, can you please create a new bug for Padman, and we'll change this > bug to etqw? That way, we can upstream the other fix, and deal with this > separately. Ben,Padman can works well on latest mesa(master)f68a973d with or without this patch. So,this bug does not affect the padman. Now the problem: Run the etqw-demo will appear "Segmentation fault" on latest nightly kernel and latest mesa, meanwhile GPU will hang.
(In reply to Anuj Phogat from comment #44) > Recent patch (http://patchwork.freedesktop.org/patch/44605) by Neil Roberts > fixes the misrendering in the demo. Tested the above patch(44605),the problem also exists after run etqw-demo. "Segmentation fault (core dumped)" and GPU hang.
Tested on the latest nightly kernel(5ea91d) and latest mase(cc5860e4, this issue does not exists on skl. Verified it.
Tested on the latest nightly kernel(5ea91d) and latest mesa(cc5860e), this issue does not exists on skl. Verified it.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.