Bug 27115 - [clarkdale] GPU hangs and resets indefinitely
Summary: [clarkdale] GPU hangs and resets indefinitely
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: ykzhao
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-16 15:07 UTC by Geir Ove Myhr
Modified: 2017-07-24 23:08 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg with drm.debug=0x02 (245.04 KB, text/plain)
2010-03-16 15:07 UTC, Geir Ove Myhr
no flags Details
Xorg.0.log (14.04 KB, text/plain)
2010-03-16 15:08 UTC, Geir Ove Myhr
no flags Details
i915_error_state (759.59 KB, text/plain)
2010-03-16 15:24 UTC, Geir Ove Myhr
no flags Details

Description Geir Ove Myhr 2010-03-16 15:07:50 UTC
Created attachment 34126 [details]
dmesg with drm.debug=0x02

Originally reported by KeithM at:
  https://bugs.launchpad.net/bugs/516909

Occationally, the GPU hangs, gets reset, then hangs again, etc. With drm.debug=0x02 on recent drm-intel-next kernel, dmesg fills up with 

[34143.216468] [drm:i915_add_request], 524154
[34143.216659] [drm:i915_add_request], 524155
[34143.713488] [drm:intel_gpu_idle_timer], idle timer fired, downclocking
[34143.963261] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[34143.963268] render error detected, EIR: 0x00000000
[34143.963284] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 524155 at 512362)
[34143.963298] [drm:i915_error_work_func], generating error event
[34143.963323] [drm:i915_error_work_func], resetting chip
[34143.963533] [drm:gm45_get_vblank_counter], trying to get vblank count for disabled pipe 1

keith@newb:~$ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation Clarkdale Integrated Graphics Controller [8086:0042] (rev 12)
Comment 1 Geir Ove Myhr 2010-03-16 15:08:56 UTC
Created attachment 34127 [details]
Xorg.0.log
Comment 2 Geir Ove Myhr 2010-03-16 15:24:12 UTC
Created attachment 34128 [details]
i915_error_state

Decoded by intel_error_decode at http://launchpadlibrarian.net/41062268/IntelErrorDecode.txt . I find two things interesting:
1. ACTHD: 0x116179c8 which is outside any of the buffers.
2. IPEHR: 0x01800002 since I have seen a lot of MI_WAIT_FOR_EVENT "Display Pipe A/B Scan Line Window Wait Enable" in this register at other bug reports on i965 and GM45 lately.
Comment 3 KeithM 2010-03-16 17:02:18 UTC
I'm the original bug filer over at Ubuntu launchpad 516909.

I'm at your disposal to try new kernels, patches, etc.  Please provide enough detail/links to howto's when replying --- I've only started messing with kernels, options, PPAs, etc recently.

Adding myself to CC: list.

Thanks

Keith


Comment 4 Chris Wilson 2010-03-18 11:33:49 UTC
(In reply to comment #2)
> Created an attachment (id=34128) [details]
> i915_error_state
> 
> Decoded by intel_error_decode at
> http://launchpadlibrarian.net/41062268/IntelErrorDecode.txt . I find two things
> interesting:
> 1. ACTHD: 0x116179c8 which is outside any of the buffers.

Looks like we are reading the wrong register for Active Head on Ironlake when grabbing error state.

> 2. IPEHR: 0x01800002 since I have seen a lot of MI_WAIT_FOR_EVENT "Display Pipe
> A/B Scan Line Window Wait Enable" in this register at other bug reports on i965
> and GM45 lately.

Similarly, I know there has been a lot of poking in this area in order to set up event triggering. Reassigning to a more knowledgeable person.
Comment 5 Geir Ove Myhr 2010-03-18 12:38:44 UTC
Based on a lot of similar automatic bug reports for Ubuntu, the currently active batchbuffer is probably:

0x0e043000:      0x09000000: MI_LOAD_SCAN_LINES_INCL
0x0e043004:      0x000004b0:    dword 1
0x0e043008:      0x09000000: MI_LOAD_SCAN_LINES_INCL
0x0e04300c:      0x000004b0:    dword 1
0x0e043010:      0x01800002: MI_WAIT_FOR_EVENT
0x0e043014:      0x54f08806: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 1, dst tile 1)
0x0e043018:      0x03cc0680:    format 8888, dst pitch 1664, clipping disabled
0x0e04301c:      0x00000000:    dst (0,0)
0x0e043020:      0x04b00640:    dst (1600,1200)
0x0e043024:      0x04781000:    dst offset 0x04781000
0x0e043028:      0x00000000:    src (0,0)
0x0e04302c:      0x00000680:    src pitch 1664
0x0e043030:      0x0bb98000:    src offset 0x0bb98000
0x0e043034:      0x02000000: MI_FLUSH
0x0e043038:      0x00000000: MI_NOOP
0x0e04303c:      0x05000000: MI_BATCH_BUFFER_END

and the correct value for ACTHD is 0x0e043014. There are many bug reports downstream with a hang in this kind of batch buffer on GM45 and 965GM, and ACTHD is always 0x.....014.
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/539804
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/539538
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/537874
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/535218
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/535010

Comment 6 KeithM 2010-03-31 11:42:52 UTC
Yakui,

Is there any additional information you might need to fix this?  Other things I can try?

Thanks
Keith
Comment 7 KeithM 2010-04-10 13:14:47 UTC
Please see https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/516909/comments/58

It seems upgrading to the 2.6.32-19.28 kernel fixes my problem.

I've run testing with it for 3-4 days, and everything is much improved.

Please change the status as appropriate.

Thanks

Keith
Comment 8 Geir Ove Myhr 2010-04-11 01:53:52 UTC
(In reply to comment #7)
> It seems upgrading to the 2.6.32-19.28 kernel fixes my problem.

The patch that is included in 2.6.32-19.28 (with 2.6.33.1 drm + patches) and was intended to fix this is 

  [ Jesse Barnes ]
  * SAUCE: drm/i915: don't change DRM configuration when releasing load
    detect pipe

http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commit;h=0d2907f4bead56cff60f91068b3a3efa7149e702

I haven't seen this being applied upstream in linux-2.6.33.y or linux-2.6 git trees.
Comment 9 ykzhao 2010-06-20 18:52:23 UTC
(In reply to comment #7)
> Please see
> https://bugs.launchpad.net/xserver-xorg-video-intel/+bug/516909/comments/58
> 
> It seems upgrading to the 2.6.32-19.28 kernel fixes my problem.
> 
> I've run testing with it for 3-4 days, and everything is much improved.
> 
> Please change the status as appropriate.
> 
> Thanks
> 
> Keith

thanks for the updating. So this bug will be marked as resolved.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.