Summary: | [SKL]etqw system hang | ||
---|---|---|---|
Product: | Mesa | Reporter: | ye.tian <yex.tian> |
Component: | Drivers/DRI/i965 | Assignee: | Ben Widawsky <ben> |
Status: | VERIFIED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | critical | ||
Priority: | highest | CC: | intel-gfx-bugs, james.ausmus, nroberts |
Version: | git | ||
Hardware: | All | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
error state info
Set a minimum stencil qpitch disable RCC camming (kernel patch) error state info (not full) ETQW-demo2 logs ETQW-demo logs etqw config file demos file disable PMA stall workaround Same as before, but it compiles this time. i915_error_state info dmesg info i915_error_state info etqw-demo picture after GPU hang |
Description
ye.tian
2015-02-09 11:26:26 UTC
At this point it's more likely that Mesa needs a bit more SKL work than it being a kernel bug. Well either that or some other problem on an early stepping that didn't go through a very thorough binning process. Padman demo also causes system hang. Please attach error state. And please try master (which now has 5b29b2922afe2b8167a589fc2896a071fc85b693) It also causes system hang with the mesa(5b29b292). I can not get the full error state due to system hang. (In reply to ye.tian from comment #5) > It also causes system hang with the mesa(5b29b292). > I can not get the full error state due to system hang. I get this full error state by cat file, plesae see the attach file. Created attachment 113340 [details]
error state info
Created attachment 113392 [details] [review] Set a minimum stencil qpitch Please test. I doubt this will do anything. I can't find anything else wrong in the error state (the depth buffer offset looks fishy, 0, but 0 is a valid active BO). Created attachment 113438 [details] [review] disable RCC camming (kernel patch) Please test this one too. Also, please test without simd16 dispatch INTEL_DEBUG=no16 (In reply to Ben Widawsky from comment #8) > Created attachment 113392 [details] [review] [review] > Set a minimum stencil qpitch > > Please test. I doubt this will do anything. > > I can't find anything else wrong in the error state (the depth buffer offset > looks fishy, 0, but 0 is a valid active BO). It still exist on this patch. (In reply to Ben Widawsky from comment #10) > Also, please test without simd16 dispatch > INTEL_DEBUG=no16 This issue also exists. Did you test the kernel patch? https://bugs.freedesktop.org/attachment.cgi?id=113438 (In reply to Ben Widawsky from comment #13) > Did you test the kernel patch? > https://bugs.freedesktop.org/attachment.cgi?id=113438 Yes, I did test it and with/without INTEL_DEBUG=no16. Can you please attach the error state with both patches applies, and INTEL_DEBUG=no16? Created attachment 113455 [details]
error state info (not full)
I did not get the full error info. Ye Tian, can you please attach your padman demo file as well as the command line you're using to invoke it? Created attachment 113829 [details] ETQW-demo2 logs Ye Tian I downloaded and installed ETQW-demo2-client-full.r1.x86.run from http://www.splashdamage.com/node/222. On executing, user interface loads fine with no hangs but failed to load textures when tried playing the game. Did you use the same version of the demo? Any suggestions to fix the texture loading issue? Created attachment 113838 [details]
ETQW-demo logs
Only run ./etqw.x86 is good.
I also downloaded and installed ETQW-demo2-client-full.r1.x86.run, but cannot find the demo file, So I copy the demo file from "etqw-demo-1.1.0" to "etqw.demo". run this command "vbank_mode=0 ./etqw.x86 +set sys_VideoRam 64 +set r_mode -1 +set in_tty 0 +exec etqw-pts.cfg +set r_customWidth 1920 +set r_customHeight 1080 +vid_restart", it also causes system hang and render error.
You can try it.
Created attachment 113839 [details]
etqw config file
You can download and put it in etqw.demo/base/.
Created attachment 113840 [details]
demos file
You can unzip and put this folder in etqw.demo/base/.
This bug is also exist on Mesa10.5rc2 testing Please test this branch: http://cgit.freedesktop.org/~bwidawsk/mesa/log/?h=workarounds Created attachment 114012 [details] [review] disable PMA stall workaround Please test this backportable patch instead of my branch. Created attachment 114013 [details] [review] Same as before, but it compiles this time. Please test this on mesa master. (In reply to Ben Widawsky from comment #26) > Created attachment 114013 [details] [review] [review] > Same as before, but it compiles this time. > > Please test this on mesa master. Test this on mesa master(0dfec59a) with patch, the system still hang. Created attachment 114081 [details]
i915_error_state info
As same as the above info.
We cannot reproduce this hang. Can you please test again, and if it still fails, update the BIOS and test again. Thanks. Tested on mesa master(0dfec59a). Without patch demo hanged consistently at a specific frame. With the patch demo did run fine few times with out a hang. But executing it multiple times cause random hangs at a different frame every time. (In reply to Ben Widawsky from comment #29) > We cannot reproduce this hang. Can you please test again, and if it still > fails, update the BIOS and test again. > > Thanks. Re-test again, It still fails. I cannot update the BIOS, because I have not received the corresponding CPU. I will test again as soon as I receive the new CPU. Ye Tian, we are seeing the same issue now. Do not worry about upgrading the BIOS. As I mentioned in comment 30, Ben's patch changed the behavior of hang. Without patch, demo hanged at a specific frame every time. With patch, it run fine few times before hanging at a random frame. Ye Tian, did you also see similar change of behavior after the patch? With drm-intel-fixes and this patch, I go more than an hour before I hit a hang. I've seen it go as much as 4 hours. I am not sure that is the same hang as the original bug report. If the behavior is confirmed, I'd like to merge the patch and close this bug. (In reply to Ben Widawsky from comment #34) > With drm-intel-fixes and this patch, I go more than an hour before I hit a > hang. I've seen it go as much as 4 hours. I am not sure that is the same > hang as the original bug report. > > If the behavior is confirmed, I'd like to merge the patch and close this bug. Tested with drm-intel-fixes(5e4f51) and this patch, it will appear the below info after running for a while and auto interrupt the process of the games. "5988 Segmentation fault (core dumped)" The system is not hang,but GPU hang. please see the dmesg and error_state info. Created attachment 114266 [details]
dmesg info
Created attachment 114267 [details]
i915_error_state info
Tested with drm-intel-fixes(5e4f51) and latest Mesa(master)30916a5ef. I saw that demo hanged at a specific frame every time. The rest issue is same as the patch. system is not hang, GPU hang, i915_error_state. "drm/i915: Resetting chip after gpu hang" Tested on new processor. (In reply to ye.tian from comment #35) > (In reply to Ben Widawsky from comment #34) > > With drm-intel-fixes and this patch, I go more than an hour before I hit a > > hang. I've seen it go as much as 4 hours. I am not sure that is the same > > hang as the original bug report. > > > > If the behavior is confirmed, I'd like to merge the patch and close this bug. > > > Tested with drm-intel-fixes(5e4f51) and this patch, it will appear the below > info after running for a while and auto interrupt the process of the games. > "5988 Segmentation fault (core dumped)" How long is "a while"? Do you see the same behavior with other games as well if you run them for, "a while"? (In reply to Ben Widawsky from comment #40) > (In reply to ye.tian from comment #35) > > (In reply to Ben Widawsky from comment #34) > > > With drm-intel-fixes and this patch, I go more than an hour before I hit a > > > hang. I've seen it go as much as 4 hours. I am not sure that is the same > > > hang as the original bug report. > > > > > > If the behavior is confirmed, I'd like to merge the patch and close this bug. > > > > > > Tested with drm-intel-fixes(5e4f51) and this patch, it will appear the below > > info after running for a while and auto interrupt the process of the games. > > "5988 Segmentation fault (core dumped)" > > How long is "a while"? Do you see the same behavior with other games as well > if you run them for, "a while"? Re-tested with drm-intel-fixes(2dccc9)and this patch, I found that GPU hang after running about 3 minutes, (picture attach)but the etqw-demo will running very slowly. Maybe with you see the phenomenon is the same. The padman demo is good. Tested with -nightly(f7def4) and this patch: run result as below time ./etqw-demo.sh ./etqw-demo.sh: line 15: 5886 Segmentation fault (core dumped) vbank_mode=0 ./etqw.x86 +set sys_VideoRam 64 +set r_mode -1 +set in_tty 0 +exec etqw-pts.cfg +set r_customWidth $w +set r_customHeight $h +vid_restart > /tmp/tmp.log 2>&1 real 0m57.088s user 0m41.197s sys 0m3.875s The padman demo is good. Created attachment 114345 [details]
etqw-demo picture after GPU hang
Ye Tian, can you please create a new bug for Padman, and we'll change this bug to etqw? That way, we can upstream the other fix, and deal with this separately. Recent patch (http://patchwork.freedesktop.org/patch/44605) by Neil Roberts fixes the misrendering in the demo. (In reply to Ben Widawsky from comment #43) > Ye Tian, can you please create a new bug for Padman, and we'll change this > bug to etqw? That way, we can upstream the other fix, and deal with this > separately. Ben,Padman can works well on latest mesa(master)f68a973d with or without this patch. So,this bug does not affect the padman. Now the problem: Run the etqw-demo will appear "Segmentation fault" on latest nightly kernel and latest mesa, meanwhile GPU will hang. (In reply to Anuj Phogat from comment #44) > Recent patch (http://patchwork.freedesktop.org/patch/44605) by Neil Roberts > fixes the misrendering in the demo. Tested the above patch(44605),the problem also exists after run etqw-demo. "Segmentation fault (core dumped)" and GPU hang. Tested on the latest nightly kernel(5ea91d) and latest mase(cc5860e4, this issue does not exists on skl. Verified it. Tested on the latest nightly kernel(5ea91d) and latest mesa(cc5860e), this issue does not exists on skl. Verified it. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.