Created attachment 68411 [details] dmesg output The X server freezes randomly while I am using my PC for normal desktop usage (web browsing, movie watching, etc.) The box is a Lenovo IdeaPad U410 ultrabook with Core i7 Ivy Bridge CPU and I am using the integrated Intel HD 4000 graphics card. The operating system running on it is Gentoo Linux. I found these lines in dmesg that could be relevant: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state Here are the versions of the packages I am using: Linux kernel 3.6.1 X.Org X Server 1.13.0 Release Date: 2012-09-05 xf86-video-intel: 2.20.9 mesa: 9.0_pre20120918
Created attachment 68412 [details] i915_error_state
Created attachment 68413 [details] Xorg log
Death sometime before the flush following a mesa batch completes. First I would make sure you have all the latest w/a, so pull one of Daniel's famous kernels. http://cgit.freedesktop.org/~danvet/drm-intel #drm-intel-fixes
Installed a 3.6.1 kernel with the patches found in the drm-intel-fixes branch you mentioned. After a few days of testing hopefully I will have good news. Yesterday I was unable to use the PC for an hour without having to restart the X server. :(
The problem still exists even with the patched kernel.
The next step is trying the latest mesa-8.0 and mesa-9.0 releases.
Already tried both of them and unfortunately I had the same problem. I saw that a new version (2.20.10) of the xf86-video-intel package is out, I will try it too.
I'd pick the ddx from git to avoid a bug on IvyBridge with ComponentAlpha glyphs (if you use subpixel antialiasing on fonts). Also, can you attach a few more error-states in case one of those captures a more obvious cause?
What is ddx? Could you put the URL of the GIT repository here? Oh, I have one more information that is probably important. In the last few days I disabled all the desktop effects in KDE and now the system seems to be stable.
The Display Dependent X (ddx) is xf86-video-intel (http://cgit.freedesktop.org/xorg/driver/xf86-video-intel). There is no urgent need to upgrade that if kwin without desktop effects is stable. That last statement points towards an interaction between mesa and the kernel as being the cause, either missing a workaround or just plain broken.
What else could I do to help you? Collect more error-states?
Yes, more error states would definitely help. Narrowing down if there is any particular effect that seems to trigger a hang would be a big help as well.
Okay, I will try to get some more.
Hi there. I have this problem too. I have an HD 4000 (gen6) from an i7 3612QM. Happens in KDE 4.8.5 with both kernel 3.6.4 and 3.7-rc3. libdrm 2.4.39, xorg-server 1.12.2, mesa 9.0, video-intel 2.20.12 on gentoo linux. UXA seems more problematic then SNA, so I might end up using UXA to reproduce this issue more and add my error-states, if this is ok for Chris. But for me mesa 8.0.4 works like a charm, I have this problem only in mesa 9.0. I have to add I have random corruption on the top of the screen coming and going when moving the mouse around. It should not trigger desktop effects this way. It might be firefox given it is almost always the foreground application and it is using the whole screen. I will do more testing with time adding some error state if I can find some time.
Mesa 8 really fixes the problem for you? I also tried mesa 8 when Chris suggested but the system also freezed with it once. Unfortunately I was not able to get dmesg output that time so I couldn't be sure it was the same problem. However after I thought the problem is the same I went back to mesa 9. Probably I will give another go to mesa 8 with the hope it fixes my problem too.
(In reply to comment #15) > Mesa 8 really fixes the problem for you? mesa 8.0.4 just works for me. No corruptions, no hungs. mesa 9.0 regressed. I discovered something important: I always get a corruption on the top of the screen when a tooltip pops up. *Always*. Even if the tooltip is on the bottom of the screen (keep the mouse over an application of the taskbar for example). This should be quite easy to reproduce I think. The co rruption sometimes goes away almost immediatly, sometimes I have to trigger a refresh of the interested section. About the error state: I've spent all the morning trying to crash the driver. I failed! It is quite hard to hung the GPU. I keep trying. @Chris: Is there something I can do to help debug the corruption?
(In reply to comment #16) > @Chris: Is there something I can do to help debug the corruption? Can you capture the corrutption with a photo? I see some rendering corruption in the top-left corner on various games (a few blocks of white typically) and just wondering if this is the same.
Created attachment 69306 [details] Screen corruption Here it is :). I reproduced the corruption with mesa git master as of 2 hours ago. Usually the corruption goes away in a fraction of seconds, but not always. In this case it disappeared when I triggered a refresh of the top of the screen. Rarely also the mouse pointer can get corrupted, as the window decoration.
That matches what I've seen as well with recent mesa. Thanks. And from the photo we can clearly see that it is a tiling corruption - some render surface has the wrong attributes.
Created attachment 69307 [details] Xorg log 1st hang
Hello, I just "managed" to get the GPU hang twice in one hour... My system: Ivybridge desktop (core i7-3770 with HD 4000) kernel 3.6.3 (64 bit) xserver 1.13.0 with SNA enabled mesa 9.0 xf86-video-intel 2.20.12 libdrm 2.4.39 I am attaching: - the X server logs - the content of i915_error_state after each hang - the system log (I use systemd) of the last week (you can see that other hangs occurred in the past, but I had not enabled the drm debug). The last two hangs, which are related to the attached logs, occurred today at 13:13 and 14:20. Only kernel messages are shown in this log. If you need more information or you want me to do some testing, just ask.
Created attachment 69309 [details] Xorg log 2nd hang
Created attachment 69310 [details] i915_error_state 1st hang
Created attachment 69311 [details] i915_error_state 2nd hang
Created attachment 69312 [details] System log
(In reply to comment #19) > That matches what I've seen as well with recent mesa. Thanks. And from the > photo we can clearly see that it is a tiling corruption - some render > surface has the wrong attributes. If I can do something else just let me know. I have no problem compiling mesa master or official branches. I can also manage to apply custom patch or whatever if needed. I'm on gentoo linux so it is not so hard. Thank you
Hmm, all the error states are most peculiar, dying immediately after MI_SET_CONTEXT with garbage in the IPEHR. Can someone try: diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 53ba395..f8eab98 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -252,7 +252,7 @@ void i915_gem_context_init(struct drm_device *dev) struct drm_i915_private *dev_priv = dev->dev_private; uint32_t ctx_size; - if (!HAS_HW_CONTEXTS(dev)) + if (!HAS_HW_CONTEXTS(dev) || 1) dev_priv->hw_contexts_disabled = true; if (dev_priv->hw_contexts_disabled)
I am currently compiling a 3.6.4 kernel with the patch Chris provided, however the source in the latest stable kernel looks a bit different: if (!HAS_HW_CONTEXTS(dev)) { dev_priv->hw_contexts_disabled = true; return; } I have no idea whether this is good or bad, it will turn out soon. :)
(In reply to comment #28) > I am currently compiling a 3.6.4 kernel with the patch Chris provided, > however the source in the latest stable kernel looks a bit different: > > if (!HAS_HW_CONTEXTS(dev)) { > dev_priv->hw_contexts_disabled = true; > return; > } > > I have no idea whether this is good or bad, it will turn out soon. :) You could also do: diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 7274360..7bed78f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1148,7 +1148,7 @@ struct drm_i915_file_private { #define HAS_LLC(dev) (INTEL_INFO(dev)->has_llc) #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)->need_gfx_hws) -#define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6) +#define HAS_HW_CONTEXTS(dev) (0)
I just added "|| 1" to the if to make it true in all cases. I was worried that my version is a bit different compared to the one Chris posted in the diff.
Since the new kernel was booted with the patch, the system seems to be stable. The screen corruption still exists that was also reported by others but the GPU hang issue is disappeared with the KDE effects enabled again. I don't know whether it makes any difference but with the new kernel I also upgraded to xf86-video-intel-2.20.12.
Ok, I think we need to split this bug up into two parts: - tracking the gpu hang, which seems to be due to hw contexts. I think we can leave this issue in this bug report here (title adjusted) - the corruption, which is a regression from mesa 8 to mesa 9/master. It might be that this is simply the lack of a hw workaround, can you please first try the latest drm-intel-nightly branch from the drm-intel kernel git repo at http://cgit.freedesktop.org/~danvet/drm-intel If that does not help, please file a new bug report against mesa/i965, mentioning that this is a regression.
Ok, I have no idea what actual w/a the ARB_ON_OFF are for, they are documented as only being used with runlists and NOOP otherwise, so lets just simplify things. Can you please test with (and remember to reenable hw contexts): diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 53ba395..198c3d5 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -345,15 +345,10 @@ mi_set_context(struct intel_ring_buffer *ring, return ret; } - ret = intel_ring_begin(ring, 6); + ret = intel_ring_begin(ring, 4); if (ret) return ret; - if (IS_GEN7(ring->dev)) - intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_DISABLE); - else - intel_ring_emit(ring, MI_NOOP); - intel_ring_emit(ring, MI_NOOP); intel_ring_emit(ring, MI_SET_CONTEXT); intel_ring_emit(ring, new_context->obj->gtt_offset | @@ -364,11 +359,6 @@ mi_set_context(struct intel_ring_buffer *ring, /* w/a: MI_SET_CONTEXT must always be followed by MI_NOOP */ intel_ring_emit(ring, MI_NOOP); - if (IS_GEN7(ring->dev)) - intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE); - else - intel_ring_emit(ring, MI_NOOP); - intel_ring_advance(ring); return ret;
(In reply to comment #32) > - the corruption, which is a regression from mesa 8 to mesa 9/master. It > might be that this is simply the lack of a hw workaround, can you please > first try the latest drm-intel-nightly branch from the drm-intel kernel git > repo at > > http://cgit.freedesktop.org/~danvet/drm-intel > > If that does not help, please file a new bug report against mesa/i965, > mentioning that this is a regression. Tested, the corruption is still there, I opened bug #56610
Created attachment 69357 [details] Error state with patch on comment #33 applied I applied the patch from comment #33 to linux 3.7-rc3. I had 2 GPU hangs in a row. This is the error state
Ok, no change there then. That implies that is the MI_SET_CONTEXT context alone that causes it to execute -1, or at least die with IPEHR==-1.
(In reply to comment #34) > (In reply to comment #32) > > - the corruption, which is a regression from mesa 8 to mesa 9/master. It > > might be that this is simply the lack of a hw workaround, can you please > > first try the latest drm-intel-nightly branch from the drm-intel kernel git > > repo at > > > > http://cgit.freedesktop.org/~danvet/drm-intel > > > > If that does not help, please file a new bug report against mesa/i965, > > mentioning that this is a regression. > > Tested, the corruption is still there, I opened bug #56610 (In reply to comment #34) > (In reply to comment #32) > > - the corruption, which is a regression from mesa 8 to mesa 9/master. It > > might be that this is simply the lack of a hw workaround, can you please > > first try the latest drm-intel-nightly branch from the drm-intel kernel git > > repo at > > > > http://cgit.freedesktop.org/~danvet/drm-intel > > > > If that does not help, please file a new bug report against mesa/i965, > > mentioning that this is a regression. > > Tested, the corruption is still there, I opened bug #56610 Just to inform all the people in cc: here: it seems this is a bug in kwin and it has been fixed (by Kenneth Graunke). Please see my comment in the report of bug #56610
Stefano, could you try this idea from Ben to rule out mesa fouling up the context: diff --git a/src/mesa/drivers/dri/i965/brw_vtbl.c b/src/mesa/drivers/dri/i965/brw_vtbl.c index ca2e7a9..62d609b 100644 --- a/src/mesa/drivers/dri/i965/brw_vtbl.c +++ b/src/mesa/drivers/dri/i965/brw_vtbl.c @@ -178,7 +178,7 @@ static void brw_new_batch( struct intel_context *intel ) * would otherwise be stored in the context (which for all intents and * purposes means everything). */ - if (intel->hw_ctx == NULL) + if (intel->hw_ctx == NULL || 1) brw->state.dirty.brw |= BRW_NEW_CONTEXT; brw->state.dirty.brw |= BRW_NEW_BATCH;
(In reply to comment #38) > Stefano, could you try this idea from Ben to rule out mesa fouling up the > context: Sure, I will try this patch and let you know.
It 5 days I run kwin patched with the fix found in bug #56610 . The screen corruption is definetly gone, and also the GPU hang is gone! I'm not able to hang it anymore. The kernel is fresh compiled (I updated to 3.6.5) without Chris' patches. Mesa is also without patches.
KDE has been upgraded to 4.9.3 on my machine recently and it seems to be a lot better compared to the previous version. Animations of the effects now seem to be as smooth as they should be. With 4.9.2 I had the same feeling when you play a 3D game with low FPS. The KDE problem pointed out in bug #56610 is included in this release that could be the trick.
Stefano, both Enrico and Zoltan report that their systems are stable with an updated kwin to avoid the incorrect msaa rendering, can you confirm? In which case we can close this a side-effect of bug #56610.
Yes, no more CPU hangs since I updated kwin. Running both with the mesa patch suggested in comment #38 (for about 3 weeks) and without it (last week I updated to a clean mesa 9.0.1). Thanks for taking care of this issue.
Thanks everyone. *** This bug has been marked as a duplicate of bug 56610 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.