25.271395] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 38.785285] [drm] GPU HANG: ecode 7:0:0x8edcfff1, in TSK_VEncode4 [1674], reason: Hang on render ring, action: reset [ 38.817146] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 38.848304] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 38.879736] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 38.912668] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 38.944964] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 44.655797] EXT4-fs (ram0): re-mounted. Opts: (null) [ 47.697675] drm/i915: Resetting chip after gpu hang
Created attachment 136914 [details] dmesg log and i915_error_state
the version info is "Linux haswell 4.8.0haswell #16 SMP Wed Nov 15 15:44:20 CST 2017 x86_64 GNU/Linux"
VAAPI version is 1.8.3 libdrm version is 2.4.81 intel-vaapi-driver version is 1.8.3
*** Bug 104747 has been marked as a duplicate of this bug. ***
(In reply to Elizabeth from comment #5) > *** Bug 104747 has been marked as a duplicate of this bug. *** (In reply to zhoubo from comment #3) > (In reply to Elizabeth from comment #1) > > Hello Zhoubo. If reproducible, could you try a more recent kernel > > https://www.kernel.org? Thanks. > > I find the reason cause gpu hang may be encode rate control mode. > First I choose VBR mode, gpu hang occurs in most 10 mins. > Then I choose CBR mode, gpu hang doesn't occur again. > I find some difference in i965 driver, but I don't confirm which one is the > bug. > > START: 0x00312000 > HEAD: 0x02006e30 > TAIL: 0x00008848 > CTL: 0x0001f001 > HWS: 0x00311000 > ACTHD: 0x00000000 6a934b44 > IPEIR: 0x00000000 > IPEHR: 0x71000007 > INSTDONE: 0xffdcffff > BBADDR: 0x00000000 6a934b45 > BB_STATE: 0x00000120 > INSTPS: 0x80000208 > INSTPM: 0x00006080 > FADDR: 0x00000000 6a934d00 > > according to the gpu hang info, I found "IPEHR: 0x71000007" means the gpu > was hang at this address. And I found the param might be error in function > "gen75_mfc_batchbuffer_emit_object_command" or > "gen75_vme_fill_vme_batchbuffer" because of "*command_ptr++ = > (CMD_MEDIA_OBJECT | (9 - 2))". > > So I think the reason may be some param different in CRB and VBR, and it > lead to different in cmd "CMD_MEDIA_OBJECT", finally gpu hang occured. > Is this right? > If it's right ,could you help me find which is the bug ? Hello Zhoubo, I believe both bugs have the same root cause, that's why I marked them as duplicated. According to your logs, your using a 4.8 kernel while actual stable release is 4.14+, so issue could be already fixed by any recent commit. Could you please try 4.14+ or drm-tip kernels. Also you may want to visit https://01.org/linuxgraphics/community and give a try do the dev-community in irc to consult about you issue.
Elizabeth, Thanks for your help.we are trying to use 4.14.15 to test if the bug will be solved. But we need to take a high risk to update the kernel even if it works because our product is close to release. So could you give me a patch or the commit log to solve the bug?
(In reply to zhoubo from comment #7) > Elizabeth, > Thanks for your help.we are trying to use 4.14.15 to test if the bug will > be solved. > But we need to take a high risk to update the kernel even if it works > because our product is close to release. So could you give me a patch or the > commit log to solve the bug? Hello again Zhoubo. In this cases, if you have an agreement with Intel you can bump priority to speed this up by escalating this issue using proper internal channel, otherwise if kernel 4.15 fixes the issue I recommend you to compare the code between 4.8 and 4.15 where you suspect that the issue is affected to see what changes fixes it. Thank you.
First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Closing, please re-open is issue still exists.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.