We need a notch more information here ... see https://01.org/linuxgraphics/documentation/how-report-bugs-0
Created attachment 80827 [details] Relevant lines from dmesg
Created attachment 80828 [details] debug info from kernel
I've been fighting with this problem since 3.8.x and appreciate that the latest kernel (3.10.x) seems to have this well under control. In particular for 3.8.0, pounding heavily on the drm causes hang checks that require an X restart or a full reboot. With 3.8.3 and esp. 3.10.x the hangs are more gracefully handled (yay!) but they still occur. I can reliably (although still randomly) cause a GPU hang by running three glxgears and then watching a youtube video at HD res. Additional system info gather using apport: ProblemType: Bug DistroRelease: Ubuntu 13.04 Package: xorg 1:7.7+1ubuntu4 Uname: Linux 3.10.0-994-generic x86_64 .tmp.unity.support.test.0: ApportVersion: 2.9.2-0ubuntu8.1 Architecture: amd64 CompizPlugins: [core,composite,opengl,compiztoolbox,decor,vpswitch,snap,mousepoll,resize,place,move,wall,grid,regex,imgpng,session,gnomecompat,animation,fade,unitymtgrabhandles,workarounds,scale,expo,ezoom,unityshell] CompositorRunning: compiz CompositorUnredirectDriverBlacklist: '(nouveau|Intel).*Mesa 8.0' CompositorUnredirectFSW: true Date: Fri Jun 14 16:02:29 2013 DistUpgraded: Fresh install DistroCodename: raring DistroVariant: ubuntu EcryptfsInUse: Yes ExtraDebuggingInterest: Yes GpuHangFrequency: Several times a day GpuHangReproducibility: Yes, I can easily reproduce it GpuHangStarted: Immediately after installing this version of Ubuntu GraphicsCard: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device [17aa:21da] InstallationDate: Installed on 2013-04-27 (48 days ago) InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Release amd64 (20130424) Lsusb: Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 04f2:b217 Chicony Electronics Co., Ltd Lenovo Integrated Camera (0.3MP) MachineType: LENOVO 4286CTO MarkForUpload: True PlymouthDebug: Error: [Errno 13] Permission denied: '/var/log/plymouth-debug.log' ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.10.0-994-generic root=UUID=f9722d0d-2787-4da4-8c83-23da91112a32 ro crashkernel=384M-2G:64M,2G-:128M quiet splash vt.handoff=7 SourcePackage: xorg Symptom: display Title: Xorg freeze UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 04/11/2013 dmi.bios.vendor: LENOVO dmi.bios.version: 8DET68WW (1.38 ) dmi.board.asset.tag: Not Available dmi.board.name: 4286CTO dmi.board.vendor: LENOVO dmi.board.version: Not Available dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 10 dmi.chassis.vendor: LENOVO dmi.chassis.version: Not Available dmi.modalias: dmi:bvnLENOVO:bvr8DET68WW(1.38):bd04/11/2013:svnLENOVO:pn4286CTO:pvrThinkPadX220:rvnLENOVO:rn4286CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable: dmi.product.name: 4286CTO dmi.product.version: ThinkPad X220 dmi.sys.vendor: LENOVO version.compiz: compiz 1:0.9.9~daily13.04.18.1~13.04-0ubuntu1 version.ia32-libs: ia32-libs 20090808ubuntu36 version.libdrm2: libdrm2 2.4.45+git20130607.a0178c00-0ubuntu0sarvatt~raring version.libgl1-mesa-dri: libgl1-mesa-dri 9.2.0~git20130612.adf324ad-0ubuntu0sarvatt~raring version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A version.libgl1-mesa-glx: libgl1-mesa-glx 9.2.0~git20130612.adf324ad-0ubuntu0sarvatt~raring version.xserver-xorg-core: xserver-xorg-core 2:1.13.4~git20130508+server-1.13-branch. 10c42f57-0ubuntu0ricotz~raring version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu2b2 version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.1.99+git20130531.bd2557ea-0ubuntu0sarvatt~raring version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.21.9+git20130612.1f180b89-0ubuntu0sarvatt~raring version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.7+git20130516.bf72ae1f-0ubuntu0sarvatt~raring xserver.bootTime: Fri Jun 14 15:06:42 2013 xserver.configfile: default xserver.errors: xserver.logfile: /var/log/Xorg.0.log xserver.version: 2:1.13.4~git20130508+server-1.13-branch.10c42f57-0ubuntu0ricotz~raring xserver.video_driver: intel
I thought I should mention, just in case it's not clear, these GPU hangs randomly so my perscription of running multiple instances of glxgears etc. was just recipe for forcing the issue. Earlier today, I was getting GPU hangs every 3 minutes or so. I'm including the error_state file from one of these in case it's useful. I was doing nothing "outrageous" at the time, e.g. editing a source file in emacs and looking at gnuplot window. Rebooting seemed to improve the situation.
Created attachment 80963 [details] The i915_error_state file after a series of frequent GPU hangs
Can you please grab a few more error states? That first looks to be a blorp (mesa/i965) failure.
That's curious. When these issues began with the stock Ubuntu 13.04 kernel, I first tried upgrading the intel mesa stuff from the bleeding edge X repository. That didn't help. Then I tried newer and newer kernels. Maybe you guys have fixed the kernel issue and I've made things worse with the experimental mesa drivers. I will downgrade and see how the original "stable" mesa libs perform with the new kernel and grap and error states.
Having downgraded to Mesa 9.1.1 from the stable repository I've found that the problems are gone (so far) for kernels 3.9.6 and 3.10.0rc6, although still present at 3.8.x. Sorry about that confusion. But I'm grateful for the attention and help.
Looks like I spoke too soon: here is another error state on kernel 3.9.6. Required an X11 restart.
Created attachment 81269 [details] Error state from GPU hang on the 3.9.6 kernel.
And the death is still caused by a mesa blorp operation.
Ok, that is good to know. I'm glad to hear that the kernel issues are really fixed. But what to do about these mesa blorbs? I guess I'll file a report with the Ubuntu folks.
This keeps happening with kernel 3.9.6. Last one hung was not recoverable and therefore no error state. Even if it's a blorp, there is clearly kernel dependence. Can I use the i915_error_txt myself to get some insight? Or at least tell if the problem is due to a mesa blorp? I'd sure like to get to the bottom of this. What a nuisance!
Mesa's blorp is just the fancy copypixel engine i965_dri.so uses. Upgrading to latest mesa git should resolve this.
I tried using Mesa from git; the hangs are worse. Seems that the best strategy is to use kernel 3.10 rcX with Mesa 9.1.1. Does that make sense to you in any way?
Created attachment 81357 [details] [review] more w/a flushes for gen6 blorb Please try out the attached mesa patch, thanks.
That patch may be helping, but drm is still reporting hangchecks, but all have recovered so far. See attached error state. The best still seems to be 3.10.rc6 with Mesa 9.1.1. I do not believe that this combo has hung yet.
Created attachment 81413 [details] Error state after patching the git mesa drivers.
Can you please try: diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c index 7e41c84..798c727 100644 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c @@ -1079,7 +1079,7 @@ static void upload_state_base_address( struct brw_context *brw ) * If this isn't programmed to a real bound, the sampler border color * pointer is rejected, causing border color to mysteriously fail. */ - OUT_BATCH(0xfffff001); + OUT_BATCH(0x7ffff001); OUT_BATCH(1); /* Indirect object upper bound */ OUT_BATCH(1); /* Instruction access upper bound */ ADVANCE_BATCH(); diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.cpp b/src/mesa/drivers/dri/i965/gen6_blorp.cpp index 3ccd90e..a0ed34c 100644 --- a/src/mesa/drivers/dri/i965/gen6_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen6_blorp.cpp @@ -97,7 +97,7 @@ gen6_blorp_emit_state_base_address(struct brw_context *brw, * If this isn't programmed to a real bound, the sampler border color * pointer is rejected, causing border color to mysteriously fail. */ - OUT_BATCH(0xfffff001); + OUT_BATCH(0x7ffff001); OUT_BATCH(1); /* IndirectObjectUpperBound*/ OUT_BATCH(1); /* InstructionAccessUpperBound */ ADVANCE_BATCH();
I tried both patches, both singly and together, and the unpatched git drivers (three tests). AFAICT, they all lead to a similar amount of hang checking under graphics load. The details: my test consists of running three glxgears instances and then opening up firefox and trying to watch a youtube video. Of course, this is not something I generally do but it does seem to generate GPU hangs so it's a good test. In some cases, simply using compiz transitions was sufficient to get a hang, once the glxgears processes were running. In all three tests with the git drivers, I saw 4-5 hangs in a few minutes. All of them recovered. I then downgraded the drivers, restarted and performed the same test: no hangs at all.
Created attachment 81425 [details] error state for old Mesa 9.1.1 drivers Experienced the first hangcheck using 3.10-rc6 and the Mesa 9.1.1. Included here just in case it's helpful.
The last hangcheck is not associated with a blorp...
Created attachment 87627 [details] Relevant dmesg lines
Created attachment 87628 [details] The i915_error_state file after recent GPU hang
Created attachment 87629 [details] Xorg system log, with possibly relevant info to the hang
It's been a while since I reported this problem, and it's less often fatal (i.e. requiring a full reboot) with recent kernels and Mesa packages. Hangs requiring a reboot about once a week in normal usage (still way to frequent, yes?). I'm currently using Kernel 3.12.0-rc3 and the latest Mesa packages compiled by the xorg-edgers team (obtained from the xorg-edgers ppa). Any advice??
Please test Ken's snb blorp fixes from http://cgit.freedesktop.org/~kwg/mesa/log/?h=snbfixes
Please let us know whether this is still a problem with the latest Mesa (12.0.1).
The problem seems to be gone with Mesa 11.2.0. Thanks for following up.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.