Bug 25475 - [i915] Xorg crash / Execbuf while wedged
[i915] Xorg crash / Execbuf while wedged
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/intel
unspecified
x86 (IA32) Linux (All)
: highest critical
Assigned To: Carl Worth
Xorg Project Team
: regression
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-12-06 08:30 UTC by Tobias Jakobi
Modified: 2010-02-19 03:20 UTC (History)
21 users (show)

See Also:


Attachments
xorg log (16.78 KB, text/plain)
2009-12-06 08:30 UTC, Tobias Jakobi
no flags Details
kernel log / dmesg (15.12 KB, text/plain)
2009-12-06 08:30 UTC, Tobias Jakobi
no flags Details
intel_gpu_dump output (49.73 KB, application/octet-stream)
2009-12-07 06:37 UTC, Tobias Jakobi
no flags Details
new kernel log (188.94 KB, text/plain)
2009-12-07 06:40 UTC, Tobias Jakobi
no flags Details
Xorg log (59.33 KB, text/x-log)
2009-12-13 23:14 UTC, Petar Velkovski
no flags Details
kern.log (571.51 KB, text/x-log)
2009-12-13 23:14 UTC, Petar Velkovski
no flags Details
Intel gpu dump from the last crash (48.72 KB, application/x-lzma)
2009-12-15 05:32 UTC, Petar Velkovski
no flags Details
dmesg output (101.45 KB, text/plain)
2009-12-15 05:33 UTC, Petar Velkovski
no flags Details
Intel GPU dump 2 (48.95 KB, application/x-lzma)
2009-12-17 02:02 UTC, Petar Velkovski
no flags Details
intel_gpu_dump - not original reporter (114.88 KB, application/octet-stream)
2009-12-22 17:30 UTC, Robert Hooker (Sarvatt)
no flags Details
intel_gpu_dump 2 - not original reporter (115.75 KB, application/octet-stream)
2009-12-23 07:52 UTC, Robert Hooker (Sarvatt)
no flags Details
Record batch buffer at time of error (7.83 KB, patch)
2010-01-05 02:43 UTC, Chris Wilson
no flags Details | Splinter Review
latest intel_gpu_dump output (50.86 KB, application/octet-stream)
2010-01-18 12:58 UTC, Tobias Jakobi
no flags Details
Tomas M. intel gpu dump (79.86 KB, text/plain)
2010-01-19 09:32 UTC, Tomas M.
no flags Details
Record batch buffer at time of error [v2] (8.71 KB, patch)
2010-01-19 10:59 UTC, Chris Wilson
no flags Details | Splinter Review
i915_error_state (3.12 KB, text/plain)
2010-01-26 13:00 UTC, Vasily Khoruzhick
no flags Details
got this with the first patch. and kernel 2.6.32.6 (2.38 KB, application/octet-stream)
2010-01-26 15:26 UTC, Tomas M.
no flags Details
dmesg, gpu_dump output and Xorg log - not original reporter (103.36 KB, application/gzip)
2010-01-31 03:36 UTC, Tobias Diedrich
no flags Details
Intel_gpu_dump output (48.79 KB, application/x-lzma)
2010-02-01 13:25 UTC, Petar Velkovski
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Jakobi 2009-12-06 08:30:24 UTC
Created attachment 31780 [details]
xorg log

Hi there,

I just experienced a severe crash of xorg-server, which triggered the standard restart of the server - however it didn't come back...

Some informations about my system...
kernel: vanilla-2.6.32
libdrm: latest git master
mesa: latest git master
xf86-video-intel: latest git master
xorg-server: 1.7.3
gfx hardware: 00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03)

Xorg crashes with this error message:
Failed to map batchbuffer: Input/output error
(attaching entire xorg log...)

The kernel log contains a lot of:
[drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
(and probably more, again attaching the complete log)

Greets,
Tobias
Comment 1 Tobias Jakobi 2009-12-06 08:30:52 UTC
Created attachment 31781 [details]
kernel log / dmesg
Comment 2 Tobias Jakobi 2009-12-06 10:27:12 UTC
Managed to reliably trigger this by restoring my Seamonkey (Mozilla suite) session, which resulted in a bunch of tabs opening at the same time (and rendering). This either crashes or freezes X (VT switch still works) and the kernel log looks similar.

Going back to intel-2.9.1 fixes the issue, adding regression to keywords.
Comment 3 Chris Wilson 2009-12-07 03:29:50 UTC
Tobias after a GPU hang as you have here we actually need a gpu dump. Please can you follow the guide at http://intellinuxgraphics.org/how_to_report_bug.html to grab a dump and additional information. Thanks.
Comment 4 Tobias Jakobi 2009-12-07 06:15:08 UTC
This isn't working for me:

$ intel_gpu_dump
intel_gpu_dump: Couldn't find i915 debugfs directory.

Is debugfs mounted? You might try mounting it with a command such as:

	sudo mount -t debugfs debugfs /sys/kernel/debug

$ cat /sys/module/drm/parameters/debug
6

$ mount
debugfs on /sys/kernel/debug type debugfs (rw)

$ zgrep DEBUG_FS /proc/config.gz 
CONFIG_DEBUG_FS=y

I'm omitting the lsmod output, but I can assure you that drm and i915 are loaded. Any ideas?

Anyway, also after your recent patch to video-intel Chris, the lockup still happens.
Comment 5 Tobias Jakobi 2009-12-07 06:37:01 UTC
Ooooops! Sry.... looks like I forgot make modules_install.

Dumping is now working, however it was a bit harder to produce a lockup this time. Anyway, as soon as the lockup came (X didn't restart this time) I switched to console and made the dump. Funny thing is that the cursor was still movable in X. Now idea what this means...
Comment 6 Tobias Jakobi 2009-12-07 06:37:47 UTC
Created attachment 31812 [details]
intel_gpu_dump output
Comment 7 Tobias Jakobi 2009-12-07 06:40:48 UTC
Created attachment 31813 [details]
new kernel log

corresponds to the situation where the dump was made
Comment 8 Tobias Jakobi 2009-12-07 06:51:10 UTC
I'd like to add that I can also see minor visual corruptions, e.g. to fonts - I think these were mentioned in the other bugreport.
Comment 9 Tobias Jakobi 2009-12-13 12:06:19 UTC
Just did git pull for xf86-video-intel, libdrm and mesa - but the hang still does occur.

Considering that there is some stabilization work going on for a new release, I'm bumping the severity to critical. It's quite random when the hang kicks in. Sometimes it takes about half an hour, sometimes you're done in five minutes (or fewer) - so this makes the driver pretty much unusuable for me.

Anything else I can test? Does upgrading the kernel to some of the drm-next-xyz branches make sense?

Greets,
Tobias
Comment 10 Petar Velkovski 2009-12-13 23:14:00 UTC
Created attachment 32054 [details]
Xorg log
Comment 11 Petar Velkovski 2009-12-13 23:14:38 UTC
Created attachment 32055 [details]
kern.log
Comment 12 Petar Velkovski 2009-12-13 23:20:11 UTC
I am having the same experience on my computer. My system information:

System environment:
-- chipset:Intel Corporation 82945G/GZ Integrated Graphics Controller
-- system architecture: i686
-- xf86-video-intel:2.9.99.901+git20091209.093bb9eb
-- xserver:7.4
-- mesa:7.7-rc2
-- libdrm:2.4.16
-- kernel:2.6.32-020632-generic kernel
-- Linux distribution:Ubuntu 9.10
-- Machine or mobo model:Shuttle K45
-- Display connector:VGA

As for reproducing steps, I can't find any. The display usually hangs when I
try to close a web page in firefox with flash video streamed in it. But this
might be related also to the upgrade of the flash plugin two days ago. Also got
the freezing while working in blender. The last freeze occurred only by turning
the mouse wheel in Thunderbird.
Comment 13 Petar Velkovski 2009-12-13 23:23:53 UTC
My more specific information about mesa is: 7.7.0~git20091211+mesa-7-7-branch.8413a3ae
Comment 14 Petar Velkovski 2009-12-15 05:32:49 UTC
Created attachment 32081 [details]
Intel gpu dump from the last crash
Comment 15 Petar Velkovski 2009-12-15 05:33:34 UTC
Created attachment 32082 [details]
dmesg output
Comment 16 Petar Velkovski 2009-12-17 02:02:23 UTC
Created attachment 32141 [details]
Intel GPU dump 2

This crashes are frequent! Latest crash happened with mesa 7.0~git20091216+mesa-7-7-branch.bf75ee9c. Why is this bug's priority marked as "medium"? Sending another intel gpu dump.
Comment 17 Petar Velkovski 2009-12-17 02:09:09 UTC
I also have a question, is this intel driver or mesa related? Tobias do you have any idea about this?
Comment 18 Tobias Jakobi 2009-12-17 03:14:51 UTC
I would blame it on either libdrm or xf86-video-intel, if we can assume that the bug isn't in the kernel DRM module code.

I don't see mesa producing this issue, since I don't need any 3D application to cause the hang. Just regular 2D rendering does it for me. Since I don't use any composite stuff or fancy 3D desktop effects the problem should not be mesa but the DDX.

Like I already stated above this issue is a regression, since downgrading video-intel solves the problem. So it's either a new issue inside video-intel or some code changes trigger an (already existing) error in libdrm/DRM kernel module.
Comment 19 Petar Velkovski 2009-12-17 03:35:43 UTC
If it's not mesa, I would vote for libdrm rather than xf86-video-intel. Who knows. Until this problem is picked by a developer, I don't think we can know for sure. I wonder why is there so little activity for this bug. I am getting at least 10 freezes a day. As for the libdrm, I am using libdrm 2.4.16+git20091211.edc77dd2. I'll try downgrading it and see if that might help.
Comment 20 Tobias Jakobi 2009-12-17 03:53:12 UTC
I have both mesa and libdrm at git master tip. Only video-intel is at version 2.9.1 - once I upgrade to git master I get the issues.

@Petar: Why exactly do you vote for libdrm?
Comment 21 Petar Velkovski 2009-12-18 03:08:56 UTC
I was probably wrong. I noticed the freezes after upgrading libdrm on 12th of december. Unfortunately there was big rendering problem with the intel driver and a few versions (updates from git) were not tested by me except for the rendering problem. I downgraded libdrm and the corruption occurred. Now I upgraded to the latest libdrm and downgraded to intel driver 2.9.0+git20091111.dbb68168. So far no freezes. As with the rendering problems, this bug might be introduced after commit dbb68168dc909ab2ec1d935322c3fd8581e666f1 Revert "configure: make --disable-dri work even if the server supports DRI". There are 46 commits after this one in the last driver I was using and still having the problem. Good luck finding the culprit :) 
Comment 22 Petar Velkovski 2009-12-19 06:02:04 UTC
The driver I am using for day and a half using commits up to and including dbb68168dc909ab2ec1d935322c3fd8581e666f1 is not showing any signs of this bug.
Comment 23 Petar Velkovski 2009-12-19 06:23:56 UTC
Tobias do you have an experience in reversing commits?
Comment 24 Tobias Jakobi 2009-12-19 06:51:37 UTC
It's very unlikely that commit dbb68168dc909ab2ec1d935322c3fd8581e666f1 is the culprit, since I don't use --disable-dri at all and the commit mainly touches the autotools scripts.

Concerning commit reversion: You just use git-revert with the commit id as parameter. It's all documented in the manpage. You probably want to add the -n option.
Comment 25 Petar Velkovski 2009-12-19 07:48:08 UTC
I am not saying that(In reply to comment #24)
> It's very unlikely that commit dbb68168dc909ab2ec1d935322c3fd8581e666f1 is the
> culprit, since I don't use --disable-dri at all and the commit mainly touches
> the autotools scripts.
> 
> Concerning commit reversion: You just use git-revert with the commit id as
> parameter. It's all documented in the manpage. You probably want to add the -n
> option.
> 

I am not saying that commit dbb68168dc909ab2ec1d935322c3fd8581e666f1 is the
culprit, but that the bug was introduces afterwards. I was asking you about the commit reversal because I have an idea how you should test for the problematic commit since you can reliably reproduce the bug. This might be a stupid idea but why not try it anyway. You should use "binary search. Add lets say 20 commits after commit dbb68168dc909ab2ec1d935322c3fd8581e666f1 and test. If there is a problem then take the last 10 and do the test again. I hope you understand what I'm trying to say. This might be a naive approach that might not work if the commits are interdependent, but it's a lot better than adding 40+ commits one at a time. Than again it might be a totally stupid idea :). I am not testing by using git. I am using prebuilt testing packages from Ubuntu driver testing repository. If you decide to test this way, be carefull about this commit:
commit 3f11bbec420080151406c203af292e55177e77d1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Nov 29 21:39:41 2009 +0000

    uxa-glyphs: Enable TILING_X on glyph caches.
as it introduced some other bug (25406) and was reversed afterwards.
Comment 26 Tobias Jakobi 2009-12-19 08:10:27 UTC
What you mean is a bisection, which is already fully implemented in git bisect.
Comment 27 Petar Velkovski 2009-12-19 09:39:25 UTC
Super duper if that function is already implemented. The question is do you have time to do the bisecting? If you can locate the problematic commit, we can write to the developer responsible for it, because as far as I can see, not much attention was given to this bug although they already started testing "release candidates" for the next version of the intel driver. Sorry I can't be of more help right now, I don't have much experience in building the driver by myself from git. But if you can find the problematic commit I can remove it from the testing driver packages already built for the Ubuntu distribution that I'm using and do the testing.
Comment 28 Petar Velkovski 2009-12-21 08:53:01 UTC
Locating the bug by bisecting (manualy) is not working for me due to other serious bug present between commit dbb68168dc909ab2ec1d935322c3fd8581e666f1 and commit 37f631d669c165c4fb56ccd7a6fc0a432f453b52. Tobias can you try to trigger the bug with a driver containing commits up to (including) commit 	47416b1eea09b238a997636d35998d71e0d18161 i965: Maximum number of vertices per composite is 24, not 18. I've been using this build for 3 hours and no crashes so far, except for freeze I'm getting with full screen flash videos (which is yet another bug).
Comment 29 Tobias Jakobi 2009-12-21 09:06:53 UTC
Sorry, but I have no time to debug this bug - bisecting would take an immense amount of time since testing the current bisection copy requires at least some hours of testing with different applications (in case the driver doesn't crash immediately).

The other problem you already mention yourself - it's not guaranteed that the bisection copy cleanly compiles nor if it does introduce another serious issue. I only do bisection for wine, mesa and most X components are hell to bisect.

At least I'm going to remain at 2.9.1 till Chris shows up and gives me new instructions. I hope the GPU dump gives the intel devs some clue which commit could be the culprit. Once we have a list of possible commits I can do tests by reverting the particular commits.
Comment 30 Chris Wilson 2009-12-21 09:11:06 UTC
None of the dumps so far actually capture the erroneous batch buffer. I have a patch in the works that should capture the error better, and once I've returned to civilisation (and a reliable internet connection) I'll make it available and hopefully it will find the culprit causing these bugs. Thanks for your patience and perseverance.
Comment 31 Robert Hooker (Sarvatt) 2009-12-22 13:24:06 UTC
I have noticed reverting libdrm to a 11-25-2009 checkout fixes the problem here. The problem started somewhere between the commits on 11-30 and the 2.4.16 release on 12-03. In my case browsing a directory with about 800 video thumbnails in nautilus triggers a crash reliably. Scrolling in chrome/firefox also seems to be a common trigger for it.
Comment 32 Robert Hooker (Sarvatt) 2009-12-22 17:30:27 UTC
Created attachment 32257 [details]
intel_gpu_dump - not original reporter

Acer Aspire One AOA150 intel_gpu_dump during a hang incase it helps.

Versions at the time of this dump:
Ubuntu 10.04 Lucid
xserver: master 1.7.99.2 20091217 checkout at 0cb638dc
libdrm: master 2.4.17 20091221 checkout at fdb33d56
mesa: master 7.8.0 20091221 checkout at 71678a7
kernel: drm-intel-next up to commit cf74ecbbff3e3b45bae61d28d2220f74d853e2f0 (drm/i915: remove render reclock support) Also happens with stock 2.6.32, and 2.6.33-rc1.

Xorg.0.log:
[14925.959682] (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
[14925.960043] (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
[14925.961466] (EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
(repeat)

dmesg:
[ 5490.524107] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5490.524133] render error detected, EIR: 0x00000000
[ 5490.524316] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 483783 at 483781)
[ 5490.526299] [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns -5
(repeat)

intel_gpu_top shows Sampler Cache, Filtering, Bypass FIFO, Pixel shader and Color calculator at 100% during a hang.
Comment 33 Robert Hooker (Sarvatt) 2009-12-22 17:39:39 UTC
(In reply to comment #32)

> [ 5490.526299] [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns
> -5

Sorry, I just noticed it does not give the Execbuff while wedged error with this drm-intel-next based kernel and instead says this now in place of it, but the crash seems identical otherwise in both cases. I'll upload a dump from a 2.6.32 kernel instead when I crash next if it would be more useful.
Comment 34 Robert Hooker (Sarvatt) 2009-12-23 07:52:13 UTC
Created attachment 32264 [details]
intel_gpu_dump 2 - not original reporter

(In reply to comment #32)

New dump, same versions and error messages as in comment #32
Comment 35 Robert Hooker (Sarvatt) 2009-12-29 10:40:30 UTC
Narrowed it down to two commits at least after a few days of bisecting. This libdrm that has actually been stable is 2.4.16 with these two commits reverted

intel: Check and propagate errors from building reloc-tree
792fed1e2460f96459141b5a628dd5ab4fbb87db
intel: Repeat execbuffer after EINTR
b73612e4fd69565aa2c5c2e9677f3e0af1945f7d

Reverting just the 792fed one didn't fix it but I've had 31 hours uptime with both reverted. I haven't tried with just b73612 reverted yet.
Comment 36 Robert Hooker (Sarvatt) 2009-12-30 17:46:22 UTC
8 hours in on 2.4.17 with just this commit reverted, I was *really* lucky to last this long without it reverted so it might just be this commit that was causing the problem. libdrm with both of those commits I mentioned in the last comment reverted was 100% stable though after 3 days uptime.

intel: Repeat execbuffer after EINTR
b73612e4fd69565aa2c5c2e9677f3e0af1945f7d
Comment 37 Petar Velkovski 2009-12-30 22:13:27 UTC
(In reply to comment #36)
> 8 hours in on 2.4.17 with just this commit reverted, I was *really* lucky to
> last this long without it reverted so it might just be this commit that was
> causing the problem. libdrm with both of those commits I mentioned in the last
> comment reverted was 100% stable though after 3 days uptime.
> 
> intel: Repeat execbuffer after EINTR
> b73612e4fd69565aa2c5c2e9677f3e0af1945f7d
> 

Robert if you applied the patch in  libdrm 2.4.17+git20091230.c5c503b5-0ubuntu0sarvatt~karmic I can tell you it's NOT working. I upgraded libdrm and tried with the latest intel driver from xorg edgers and Xorg crashed again (while playing some flash game on facebook).  Chris Wilson where the hell are you? Next time you decide to travel somewhere, make sure they have good internet :)
Comment 38 Chris Wilson 2009-12-31 00:25:43 UTC
(In reply to comment #36)
> 8 hours in on 2.4.17 with just this commit reverted, I was *really* lucky to
> last this long without it reverted so it might just be this commit that was
> causing the problem. libdrm with both of those commits I mentioned in the last
> comment reverted was 100% stable though after 3 days uptime.
> 
> intel: Repeat execbuffer after EINTR
> b73612e4fd69565aa2c5c2e9677f3e0af1945f7d


Hmm.

That's extremely unlikely to be the cause, it might just be conceivable that somebody is doing something extremely fishy in a signal handler that is being run when EINTR is being provoked. But, seriously, this looks like a wild goose chase. 

Comment 39 Robert Hooker (Sarvatt) 2009-12-31 16:33:28 UTC
Indeed I eventually crashed about 14 hours in with just that revert, and 792fed1e2460f96459141b5a628dd5ab4fbb87db doesn't revert cleanly from 2.4.17. It's hard to bisect because the first 3/4ths of the commits from 11-30 to 12-03 have other freezing problems on their own in that I can crash it loading a large image in firefox so it's hard to have enough uptime to actually get this crash. 2.4.16 with those 2 commits reverted was stable for 3 days though, and I haven't had more than 24 hours uptime since 2.4.16 was released because of this crash.
Comment 40 Petar Velkovski 2009-12-31 23:37:29 UTC
Bisections sound good in theory but in practise they are not working. At least not in this case. most probably because some of the commits touch the same code. I am not sure how graphic card driver development is working, but from what I've seen so far it's a devils business and it's incredibly complex. How come this bugs don't show up on the developers machine, and yet they show up when we use them? Isn't it possible to build testing applications that will force different portions of the driver code to be tested systematically and in one pass when we install the driver on our machines (and run the testing application)? I wonder why even the "http://intellinuxgraphics.org/how_to_report_bug.html" link? Can't this information gathering be automated by running a single script? Sorry for this little rambling, I know this is a place for bug reporting and not for code development model discussions :)
Comment 41 Chris Wilson 2010-01-05 02:43:00 UTC
Created attachment 32455 [details] [review]
Record batch buffer at time of error

This is the kernel patch that I hope Eric will accept into drm-intel-next that should capture the batch buffer that is triggering this error. After applying this patch (and the hang occurs) can you upload the contents of /sys/kernel/debug/dri/0/i915_error_state, please?
Comment 42 Petar Velkovski 2010-01-08 13:00:04 UTC
I've been trying to trigger this bug with the drm-intel-next kernel from 2010-01-06 and had no successes after using my system for two and a half days (with a few system suspends during that time). Today I tried with kernel 2.6.32.3 vanilla and the bug reappeared only after 3 minutes. I am not even sure if the patch was in included in drm-intel-next kernel from 6th of January. Now I'm trying with drm-intel-next kernel built today (8th of January). If the bug does not reappear while using drm-intel-next kernel is it because this bug is present in the kernel and not in the intel driver? Any suggestions Chris?
Comment 43 Chris Wilson 2010-01-08 13:46:52 UTC
Petar, that would be a logical conclusion if drm-intel-next proves stable. (At the very least it means that -intel is doing something that previous kernels struggle with, something we should be wary of). I'm at a loss to think of what recent kernel changes that may have impacted -intel. And for the record, I've not yet written a version of the batch buffer capture patch that satisfies Eric, so it is not yet included in drm-intel-next.

Thanks for your patience and continued testing.
Comment 44 Petar Velkovski 2010-01-08 15:13:08 UTC
Too bad you patch didn't get into the drm-intel-next. So the next info might be totally unusable to you but I am giving it for a reference because it is somehow different (well at least the dmesg error text).

Linux kernel: drm-intel-next 2.6.33-997 (built at 8th of January)
Libdrm: 2.4.17+git20091230.c5c503b5
xserver-xorg-video-intel: 2.10.0 git20100104.09103514
x.org: 7.4

When this bug is triggered dmesg shows:
drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns -5

This is the output of sys/kernel/debug/dri/0/i915_error_state: 

Time: 1262991388 s 668451 us
EIR: 0x00000000
  PGTBL_ER: 0x00000000
  INSTPM: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x54f00006
  INSTDONE: 0x7fc00081
  ACTHD: 0x0593c138
Comment 45 Petar Velkovski 2010-01-08 15:23:39 UTC
It seams that Santa was protecting my computer from freezing for 2 days but now he is gone. Got the X.org freeze with drm-intel-next 2.6.32-997 from 6th of January too. Same dmesg message, similar sys/kernel/debug/dri/0/i915_error_state output:
Time: 1262992580 s 316716 us
EIR: 0x00000000
  PGTBL_ER: 0x00000000
  INSTPM: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x54f00006
  INSTDONE: 0x7fc00081
  ACTHD: 0x0e1f6138

Once again going back to the rock solid (I hope) xserver-xorg-video-intel_2.9.0+git20091111.dbb68168 :)
Comment 46 Petar Velkovski 2010-01-08 15:33:36 UTC
Robert Hooker do you have any connections with Ubuntu kernel builders? If you do, can you somehow persuade them to build a kernel package of drm-intel-next that includes Chris Wilson's patch? Thanks in advance :)
Comment 47 Tobias Jakobi 2010-01-18 12:58:51 UTC
Created attachment 32701 [details]
latest intel_gpu_dump output

Sorry, I was busy so I couldn't report back sooner.

The issue still exists with latest libdrm, mesa and xf86-video-intel git master. I'm using vanilla sources 2.6.32.3 with Chris' patch applied.

$ cat i915_error_state
Time: 1263837225 s 979362 us
EIR: 0x00000010
  PGTBL_ER: 0x00000010
  INSTPM: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x00000000
  INSTDONE: 0x03ffffc0
  ACTHD: 0x00000000

I'm also going to attach the new output of intel_gpu_dump.
Comment 48 Tomas M. 2010-01-19 07:33:53 UTC
im affected with this, and so do people from http://bugzilla.kernel.org/process_bug.cgi

i first thought it was a kernel issue (not that experienced with debugging).

then i started testing different versions of xf86-video-intel and came to report here.

a reliable way to trigger this bug is to enable compiz and setup its resize method as "normal"(through compiz config settings manager) it should upgrade windows contents on the fly while resizing.

open chromium or firefox and start resizing like crazy. takes a few minutes to trigger it.

00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)


xf86-video-intel 2.9.1 works.
xf86-video-intel 2.9.99.901 does not.

tried to bisect, but half of the commits between those versions break the build, or Xorg badly.

libdrm 2.4.17
Comment 49 Colin Guthrie 2010-01-19 08:57:05 UTC
I think Tomas meant: http://bugzilla.kernel.org/show_bug.cgi?id=15004
(using my comment as an excuse to add me to CC :p)
Comment 50 Chris Wilson 2010-01-19 09:04:50 UTC
(In reply to comment #48)
> a reliable way to trigger this bug is to enable compiz and setup its resize
> method as "normal"(through compiz config settings manager) it should upgrade
> windows contents on the fly while resizing.

I think you have a different bug, since the original report is from a system not using a compositing manager. The issue with driver bugs is that they all have superficially identical symptoms and we need to catch the driver in the act of crashing the hardware to be able to find and distinguish between bugs. To grab a dump follow the instructions at http://intellinuxgraphics.org/how_to_report_bug.html . One issue faced by the reporters of the original bug is that the batchbuffer has been cleared from the lists prior to dumping the GPU contents, so we have not yet been able to identify the cause.
Comment 51 Tomas M. 2010-01-19 09:19:28 UTC
(In reply to comment #50)
> (In reply to comment #48)
> > a reliable way to trigger this bug is to enable compiz and setup its resize
> > method as "normal"(through compiz config settings manager) it should upgrade
> > windows contents on the fly while resizing.
> 
> I think you have a different bug, since the original report is from a system
> not using a compositing manager. The issue with driver bugs is that they all
> have superficially identical symptoms and we need to catch the driver in the
> act of crashing the hardware to be able to find and distinguish between bugs.
> To grab a dump follow the instructions at
> http://intellinuxgraphics.org/how_to_report_bug.html . One issue faced by the
> reporters of the original bug is that the batchbuffer has been cleared from the
> lists prior to dumping the GPU contents, so we have not yet been able to
> identify the cause.
> 

i will try to catch a gpu dump, but im almost certain its the same bug. i said i could trigger this easily with compiz's resize plugin. but the issue is present with metacity and no 3d rendering whatsoever (this is the first test i did, weeks ago). sorry if i missed this bit out. waiting for the driver to trip takes a LOT of time, this is why i posted a reliable way to trigger this.
Comment 52 Tomas M. 2010-01-19 09:32:18 UTC
Created attachment 32719 [details]
Tomas M. intel gpu dump
Comment 53 Tobias Jakobi 2010-01-19 10:33:48 UTC
(In reply to comment #50)
> One issue faced by the
> reporters of the original bug is that the batchbuffer has been cleared from the
> lists prior to dumping the GPU contents, so we have not yet been able to
> identify the cause.
> 

Huh? I though your kernel patch should solve this issue, or what is it for?
Comment 54 Chris Wilson 2010-01-19 10:57:05 UTC
The information that captures will be in i915_error_state. You will in fact need an updated patch to capture the batchbuffer on an i8xx (since BBADDR didn't exist for that generation of GPUs).
Comment 55 Chris Wilson 2010-01-19 10:59:13 UTC
Created attachment 32723 [details] [review]
Record batch buffer at time of error [v2]
Comment 56 Tobias Jakobi 2010-01-19 11:15:50 UTC
OK, so my information from comment #47 should suffice, right? Since my hardware is i915-based.
Comment 57 Chris Wilson 2010-01-19 12:35:49 UTC
Hmm, that i915_error_state would imply that the active list was indeed empty at the time that the error was captured which as you can imagine is not exactly a lot of information as to the cause of the error.
Comment 58 Tobias Jakobi 2010-01-19 12:37:14 UTC
So what should I do this? This is what I get after X output gets stuck.
Comment 59 Vasily Khoruzhick 2010-01-19 14:21:07 UTC
I'm getting following log on 945gm (very easy to reproduce with kde 4.4rc1):

Jan 16 20:09:23 anarsoul-laptop kernel: [86786.087142] [drm:i915_gem_object_pin] *ERROR* Failure to install fence: -28
Jan 16 20:09:23 anarsoul-laptop kernel: [86786.168476] [drm:i915_gem_object_pin] *ERROR* Failure to install fence: -28
Jan 16 20:09:23 anarsoul-laptop kernel: [86786.168481] [drm:i915_gem_execbuffer] *ERROR* Failed to pin buffer 19 of 26, total 16060416 bytes: -28
Jan 16 20:09:23 anarsoul-laptop kernel: [86786.168486] [drm:i915_gem_execbuffer] *ERROR* 1039 objects [22 pinned], 153628672 object bytes [27529216 pinned], 28577792/260308992 gtt bytes
Jan 16 20:09:25 anarsoul-laptop kernel: [86787.832115] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Jan 16 20:09:25 anarsoul-laptop kernel: [86787.832130] render error detected, EIR: 0x00000000
Jan 16 20:09:25 anarsoul-laptop kernel: [86787.832134] i915: Waking up sleeping processes
Jan 16 20:09:25 anarsoul-laptop kernel: [86787.832186] reboot required
Jan 16 20:09:25 anarsoul-laptop kernel: [86787.832206] [drm:i915_wait_request] *ERROR* i915_wait_request returns -5 (awaiting 14478201 at 14478197)
Jan 16 20:09:25 anarsoul-laptop kernel: [86787.832668] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged

Is it same bug? Should I file another bug report?
Comment 60 Tomas M. 2010-01-19 19:10:55 UTC
(In reply to comment #55)
> Created an attachment (id=32723) [details]
> Record batch buffer at time of error [v2]
> 

hi chris, if you still believe its a different bug, please say so, i will open a new bug.

i tried your v2 patch in order to collect more info, but 2.6.33-rc4 with this patch and xf86-video-intel 2.9.99 produces a hard lock. 

- ssh: no route to host
- sysrq-REISUB non-responding
- external VGA display shuts off. internal LVDS screen frozen.

only way to power off is to keep power button pressed for a few seconds. so i could not collect any info :(

tried this twice just to be sure.
Comment 61 Vasily Khoruzhick 2010-01-25 01:41:52 UTC
Chris, do you need any additional info?
Comment 62 Vasily Khoruzhick 2010-01-26 13:00:22 UTC
Created attachment 32829 [details]
i915_error_state

I've got this error state with first version of patch, second doesn't apply on 2.6.32.6
Comment 63 Vasily Khoruzhick 2010-01-26 13:49:19 UTC
It seems A17-affected machines (usually those who use 2 memory modules of same size) are not affected by this bug :)
Comment 64 Tomas M. 2010-01-26 14:39:27 UTC
(In reply to comment #63)
> It seems A17-affected machines (usually those who use 2 memory modules of same
> size) are not affected by this bug :)
> 

i cannot test this since my laptop has one so-dimm slot only. :( will try and test the first version of the patch.
Comment 65 Petar Velkovski 2010-01-26 15:16:40 UTC
(In reply to comment #63)
> It seems A17-affected machines (usually those who use 2 memory modules of same
> size) are not affected by this bug :)
> 

Sorry for this stupid question, are you referring to machines with two RAM modules? Mine has two slots, each holding 1GB of RAM and I'm hit by the bug. Then again as far as I know there are different BIOS settings about the method and portion of the RAM that is going to be used as a video memory. Can you be more specific about your comment? 
Comment 66 Tomas M. 2010-01-26 15:26:07 UTC
Created attachment 32836 [details]
got this with the first patch. and kernel 2.6.32.6
Comment 67 Vasily Khoruzhick 2010-01-26 23:42:55 UTC
(In reply to comment #65)

> Sorry for this stupid question, are you referring to machines with two RAM
> modules? Mine has two slots, each holding 1GB of RAM and I'm hit by the bug.
> Then again as far as I know there are different BIOS settings about the method
> and portion of the RAM that is going to be used as a video memory. Can you be
> more specific about your comment? 

Uh, then it's something else... My friend didn't hit this bug, and he has same software (gentoo ~x86), the only difference is in RAM: he has 2x1gb, and my machine has 512mb+2gb.

A17-affected machines are machines with 2 memory modules of same size (in this case memory will work in interleaved mode)
Cite from i915_gem_tiling.c:

On mobile 9xx chipsets, channel interleave by the CPU is
determined by DCC.  For single-channel, neither the CPU
nor the GPU do swizzling.  For dual channel interleaved,
the GPU's interleave is bit 9 and 10 for X tiled, and bit
9 for Y tiled.  The CPU's interleave is independent, and
can be based on either bit 11 (haven't seen this yet) or
bit 17 (common).
Comment 68 Tomas M. 2010-01-27 13:28:27 UTC
back on the issue.

since the error logs and dumps of the GPU dont provide any additional info on what is causing this, and half of the commmits from 2.9.1 to 2.9.99 break the build or xorg itself (making a bisection quite difficult). is there a way to determine a list of commits that could be introducing this regression?

in short. is there a dev willing to provide a list of suspects? im willing to test them all out.
Comment 69 Tobias Jakobi 2010-01-27 13:42:02 UTC
Adding some bits of info: My system also is using both RAM slots (2 x 1GB) and yes, I have the GPU hang issues.
Comment 70 Christian Schafmeister 2010-01-28 05:11:05 UTC
I can't confirm the ram fact. My Thinkpad X41 is using a 512mb and 1gb ram module and I'm also affected by this annoying bug.
Comment 71 Tobias Diedrich 2010-01-31 03:27:09 UTC
(In reply to comment #33)
> (In reply to comment #32)
> 
> > [ 5490.526299] [drm:i915_gem_execbuffer] *ERROR* i915_gem_do_execbuffer returns
> > -5
> 
> Sorry, I just noticed it does not give the Execbuff while wedged error with
> this drm-intel-next based kernel and instead says this now in place of it, but
> the crash seems identical otherwise in both cases. I'll upload a dump from a
> 2.6.32 kernel instead when I crash next if it would be more useful.

BTW, back when it said "Execbuff while wedged" I could recover the GPU by doing
a suspend-resume cycle on my X41. After the resume the kernel would say something along the lines of "VT switch, reenabling wedged GPU even though this might not work" and after that it would recover.

With newer kernels that say "i915_gem_do_execbuffer returns -5" I can only reboot to recover...
Comment 72 Tobias Diedrich 2010-01-31 03:35:37 UTC
(In reply to comment #19)
>  I wonder why is there so little activity for this bug. I am getting
> at least 10 freezes a day.

I'm also seeing this, but more like 0-3 freezes a day.
Most recently I upgraded to the git version of xf86-video-intel, but I still see it. When I first started using KMS everything worked fine until most likely a newer xf86-video-intel version entered Debian/testing.
Though for me this bug is independent of KMS (in contrast to https://bugs.freedesktop.org/show_bug.cgi?id=26346), I also get these GPU hangs without using KMS.
Comment 73 Tobias Diedrich 2010-01-31 03:36:42 UTC
Created attachment 32943 [details]
dmesg, gpu_dump output and Xorg log - not original reporter
Comment 74 Tobias Jakobi 2010-01-31 06:11:11 UTC
Has anyone already tested latest drm-intel-next git (http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=shortlog;h=drm-intel-next)?
Comment 75 Petar Velkovski 2010-02-01 13:20:31 UTC
(In reply to comment #74)
> Has anyone already tested latest drm-intel-next git
> (http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=shortlog;h=drm-intel-next)?
> 

I did. Xorg crashed again.

dmesg:
...
[38665.264015] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[38665.264026] render error detected, EIR: 0x00000000
[38665.264051] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 6436471 at 6436463)

/sys/kernel/debug/dri/0/i915_error_state:
Time: 1265058362 s 761303 us
EIR: 0x00000000
  PGTBL_ER: 0x00000000
  INSTPM: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x54300004
  INSTDONE: 0x7fc44081
  ACTHD: 0x0360dbbc

Also I'm sending another gpu_dump.
Comment 76 Petar Velkovski 2010-02-01 13:25:13 UTC
Created attachment 32980 [details]
Intel_gpu_dump output

linux-image-2.6.33-997-generic_2.6.33-997.201001301338
xserver-xorg-video-intel 2.10.0+git20100127.918151a7
Comment 77 Tobias Diedrich 2010-02-04 06:45:22 UTC
FWIW I've found out that I can flip /sys/kernel/debug/dri/0/i915_wedged back to 0 by writing to it.  If I do that after a suspend to ram I can continue using the system without having to reboot.  In case X11 didn't terminate because it got a failed system call I can even continue the X11 session without problems.

#!/bin/sh
sync
chvt 1
echo mem > /sys/power/state
echo 0 > /sys/kernel/debug/dri/0/i915_wedged
chvt 7

Maybe that helps someone else suffering from this bug...
Comment 78 Tobias Diedrich 2010-02-04 07:07:20 UTC
Regarding the RAM, I'm also using an X41 (Tablet) with one 512mb and one 1gb ram module.

00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03)
-- xf86-video-intel: git 20100205 1a76fa5574e8e8f88ac3518a4e4494e1af301dc1
-- xserver:7.4 (Debian/testing)
-- libdrm:git 20100205 1802e1a4e747b5906d3af10c4a53fd457eddcbb4
-- kernel:2.6.33-rc6
-- Linux distribution:Debian/testing
-- Machine or mobo model:IBM X41 Tablet 1866CTO
-- Display connector:LVDS
Comment 79 Christian Schafmeister 2010-02-04 10:04:34 UTC
Hi!
Your hint looks promising and I want to test it. My problem is that I don't have an entry "i915_wedged" in the /sys/kernel/debug/dri/0/ path. I have a lot of other entries there, but not this one. Is it only available in newer kernel versions? Currently I'm using vanilla-sources 2.5.32. Which version do you use?

Greetings
Christian
Comment 80 Christian Schafmeister 2010-02-04 10:07:31 UTC
Short correction :-)

> Currently I'm using vanilla-sources 2.5.32

It's kernel version 2.6.32

Comment 81 Tobias Diedrich 2010-02-04 21:35:37 UTC
I'm using 2.6.33-rc6.
Note that in some earlier kernel version (2.6.32? 2.6.31?) the kernel would automatically unwedge after resume for me.  So maybe for you it might be sufficient to just do a suspend-resume cycle.
Comment 82 Christian Schafmeister 2010-02-05 03:14:56 UTC
I upgraded to kernel-2.6.33-rc6 and your "hotfix" seems to be working! Thanks alot!! Finally I can use my laptop as a laptop again (suspend is working!!!!!).

On kernels <= 2.6.32 the suspend-resume cycle didn't fix the problem for me. 

I have two numbered entries under /sys/debug/kernel/dri. I don't know if this is hardware dependent, but I have to set them both to 0 so that my system gets stable after a suspend / resume cycle. I'm now running with an uptime of 10 hours and suspended about 4 times (but non of the suspend cycles was used because of the bug).

The only bugs left here:
- Sometimes the screen flickers (especially when I play video files). Switching to another vt and back solves the problem. This behaviour occures ~ once an hour.
- Sometimes there are ugly font issues. Characters don't get rendered correctly. If a character like "a" gets rendered only half, all other "a" characters on the screen have the same issue. I already had this problem in an earlier svn version and don't know what is causing this and how to fix it. I can make a screenshot if it helps solving the problem.

greetings
Christian

(In reply to comment #81)
> I'm using 2.6.33-rc6.
> Note that in some earlier kernel version (2.6.32? 2.6.31?) the kernel would
> automatically unwedge after resume for me.  So maybe for you it might be
> sufficient to just do a suspend-resume cycle.
> 

Comment 83 Paulo Zanoni 2010-02-05 05:35:06 UTC
For me, there is a very reliable way to reproduce this bug:

Open _Epiphany_ web browser and go to the following website:
http://www.powerdeveloper.org/forums/index.php

Then click on the "Developers" forum.

For me sometimes it even crashes before I click the "Developers" forum.

I'm using:
2.6.33-rc6
Xorg 1.7.4
Intel 2.10
libdrm2.4.17

Hope that helps...
Comment 84 Tobias Diedrich 2010-02-05 09:35:40 UTC
(In reply to comment #82)
> I have two numbered entries under /sys/debug/kernel/dri. I don't know if this
> is hardware dependent, but I have to set them both to 0 so that my system gets
> stable after a suspend / resume cycle. I'm now running with an uptime of 10
> hours and suspended about 4 times (but non of the suspend cycles was used
> because of the bug).

For me if I set the entry in directory '0' to 0 the one in directory '64' is also set to 0.

> The only bugs left here:
> - Sometimes the screen flickers (especially when I play video files). Switching

I had that one often with flash, but never with MPlayer.
However since updating to the git version of xf86-video-intel I haven't seen that bug I think.
I still switched MPlayer to the backend scaler now that it is available again since I'm hoping that will use less power that using the 3D engine to blit the video.  Would be interesting if that makes a measurable difference, I did not check that...

> - Sometimes there are ugly font issues. Characters don't get rendered
> correctly. If a character like "a" gets rendered only half, all other "a"
> characters on the screen have the same issue. I already had this problem in an
> earlier svn version and don't know what is causing this and how to fix it. I
> can make a screenshot if it helps solving the problem.

I also see that one, but it only started after upgrading to the git version of xf86-video-intel for me.
With the Debian/testing version I didn't experience that particular bug.
I still have those though:
https://bugs.freedesktop.org/show_bug.cgi?id=26346
http://uguu.de/~ranma/intel_kms_bugs/
Comment 85 Christian Schafmeister 2010-02-06 03:23:55 UTC
Just an addition to Tobias Dietrich's suspend hotfix:

I have to set 

echo 1 > /sys/kernel/debug/dri/0/i915_wedged
echo 1 > /sys/kernel/debug/dri/0/i915_wedged

before suspending. Otherwise suspend mode eats my battery ultra fast. Perhaps this is a thinkpad x41 problem. After resuming I set the same value to 0 again. This works for me to keep the battery drain low.

Greetings
Christian
Comment 86 Tobias Diedrich 2010-02-06 05:24:15 UTC
(In reply to comment #85)
> Just an addition to Tobias Dietrich's suspend hotfix:
> 
> I have to set 
> 
> echo 1 > /sys/kernel/debug/dri/0/i915_wedged
> echo 1 > /sys/kernel/debug/dri/0/i915_wedged
> 
> before suspending. Otherwise suspend mode eats my battery ultra fast. Perhaps
> this is a thinkpad x41 problem. After resuming I set the same value to 0 again.
> This works for me to keep the battery drain low.

Hmm, FWIW I never tested battery drain during suspend.
I don't think I disconnected AC power even once over the last month. (^^;
Comment 87 aix27249 2010-02-09 14:32:05 UTC
This bug also affects me. I described it in kernel bugzilla:
http://bugzilla.kernel.org/show_bug.cgi?id=15188

For me, it occurs only if there was at least one suspend. Usually the bug triggers 0-2 times per day. I'm using latest kernel (2.6.32.7) and userspace (libdrm 2.4.17, xf86-video-intel 2.10.0, mesa 7.7, xorg-server 1.7.4).
Comment 88 Chris Wilson 2010-02-10 01:58:53 UTC
Everyone whose bug is not fixed by this [libdrm], please open a new bug report:

commit 4f0f871730b76730ca58209181d16725b0c40184
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Feb 10 09:45:13 2010 +0000

    intel: Handle resetting of input params after EINTR during SET_TILING
    
    The SET_TILING is pernicious in that it overwrites the input arguments
    following an error in order to report the current tiling state of the
    buffer. This caught us by surprise as we then fed those arguments back
    into to the ioctl unmodified following an EINTR and so the kernel then
    reported success for the no-op. We interpreted this success as meaning
    that the tiling on the buffer had changed so updated our state and
    started using the buffer incorrectly in the new tiled/untiled manner.
    This lead to all sorts of random corruption and GPU hangs, even though
    the batch buffers would look sane (when the GPU had not wandered off
    into forbidden territory).
    
    References:
    
      Bug 25475 - [i915] Xorg crash / Execbuf while wedged
      http://bugs.freedesktop.org/show_bug.cgi?id=25475
    
      Bug 25554 - i830_uxa_prepare_access: gtt bo map failed: Input/output error
      http://bugs.freedesktop.org/show_bug.cgi?id=25554
    
    (And probably every other weird bug in the last few months.)
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Comment 89 Christian Schafmeister 2010-02-10 03:12:17 UTC
(In reply to comment #88)
>     intel: Handle resetting of input params after EINTR during SET_TILING
>      .....

I tested the problem with

- libdrm svn (with the mentioned commit included)
- intel driver svn (also tested it with 2.9.1)
- mesa 7.5.2
- xorg 7.4

and the problem still exists. But I get a different dmesg output when X is unusable again:

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 22980 at 22976)
ath5k phy0: unsupported jumbo

Greetings
Christian
Comment 90 Christian Schafmeister 2010-02-10 03:13:32 UTC
I don't know if I should fill a new bug report since the dmesg output changed. 
Comment 91 Chris Wilson 2010-02-10 03:45:15 UTC
Please file a new bug report for different errors, as the aforementioned commit fixes the original bug.
Comment 92 Tobias Diedrich 2010-02-10 07:18:54 UTC
(In reply to comment #88)
> Everyone whose bug is not fixed by this [libdrm], please open a new bug report:
> commit 4f0f871730b76730ca58209181d16725b0c40184

Great. I've recompiled libdrm now, if I don't see any errors I'll close https://bugs.freedesktop.org/show_bug.cgi?id=26346 as I'm assuming this will be fixed by that commit.
Comment 93 Tobias Jakobi 2010-02-11 08:28:46 UTC
Just to report back:

I updated my kernel to vanilla-sources 2.6.32.8 and everything else (mesa, libdrm, xf86-vid-intel) to git master and yeahh.. issue seems to be gone. At least I haven't been able to trigger it yet :)

Thanks again Chris to hunting this one down!

Greets,
Tobias
Comment 94 Tobias Diedrich 2010-02-13 05:20:26 UTC
(In reply to comment #92)
> (In reply to comment #88)
> > Everyone whose bug is not fixed by this [libdrm], please open a new bug report:
> > commit 4f0f871730b76730ca58209181d16725b0c40184
> 
> Great. I've recompiled libdrm now, if I don't see any errors I'll close
> https://bugs.freedesktop.org/show_bug.cgi?id=26346 as I'm assuming this will be
> fixed by that commit.

Okay, I haven't seen this bug since upgrading to that libdrm so I considered this one fixed. Unfortunately the issues described at https://bugs.freedesktop.org/show_bug.cgi?id=26346 still persist.
Comment 95 Tobias Diedrich 2010-02-18 19:48:22 UTC
(In reply to comment #94)
> (In reply to comment #92)
> > (In reply to comment #88)
> > > Everyone whose bug is not fixed by this [libdrm], please open a new bug report:
> > > commit 4f0f871730b76730ca58209181d16725b0c40184
> 
> Okay, I haven't seen this bug since upgrading to that libdrm so I considered
> this one fixed. Unfortunately the issues described at

I did experience another GPU hang yesterday. :(
Apparently while now much less frequent it can still happen.
Unfortunately I was in a hurry at the time and didn't make a gpu dump.
I'll open a new bug for it when it happens again.
Comment 96 Petar Velkovski 2010-02-19 02:13:23 UTC
(In reply to comment #95)
> (In reply to comment #94)
> > (In reply to comment #92)
> > > (In reply to comment #88)
> > > > Everyone whose bug is not fixed by this [libdrm], please open a new bug report:
> > > > commit 4f0f871730b76730ca58209181d16725b0c40184
> > 
> > Okay, I haven't seen this bug since upgrading to that libdrm so I considered
> > this one fixed. Unfortunately the issues described at
> 
> I did experience another GPU hang yesterday. :(
> Apparently while now much less frequent it can still happen.
> Unfortunately I was in a hurry at the time and didn't make a gpu dump.
> I'll open a new bug for it when it happens again.
> 

I experienced one or two GPU hangs since this bug was closed but this time I  couldn't switch to console in order to get dmesg or perform a GPU dump.
It's obvious that the bug reported HERE was FIXED.
I'm sure there are other bugs present, but once some of us figures out more specifically what's wrong, he should open a new bug report.
Comment 97 Tomas M. 2010-02-19 03:20:27 UTC
> > 
> > I did experience another GPU hang yesterday. :(
> > Apparently while now much less frequent it can still happen.
> > Unfortunately I was in a hurry at the time and didn't make a gpu dump.
> > I'll open a new bug for it when it happens again.
> > 
> 
> I experienced one or two GPU hangs since this bug was closed but this time I 
> couldn't switch to console in order to get dmesg or perform a GPU dump.
> It's obvious that the bug reported HERE was FIXED.
> I'm sure there are other bugs present, but once some of us figures out more
> specifically what's wrong, he should open a new bug report.
> 

yes, there is another similar bug crawling. ive got a dmesg and two gpu dumps queued up for a report.

a way to trigger it (not so reliable), is to start xscreensaver and go through the list of screensavers letting them preview in the small window. (takes a while).

enabling the video overlay feature of kernel 2.6.33 also triggers it after hours of usage.