Bug 28204

Summary:

965GM firefox crashes/corruption of screen GPU hung

Product:

xorg

Reporter:

Martin Sillence <martin>

Component:

Driver/intel

Assignee:

Chris Wilson <chris>

Status:

RESOLVED INVALID

QA Contact:

Xorg Project Team <xorg-team>

Severity:

major

Priority:

medium

CC:

kenyon

Version:

7.5 (2009.10)

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
xorg log of failure	none
last batch buffer before GPU hang	none
Another failure without the crash	none

Description Martin Sillence 2010-05-21 06:37:16 UTC

Architecture: x86_64
Kernel: 2.6.33-2-amd64
lib drm version: libdrm-intel1 2.4.18-5
Distro: Debian experimental drivers - (testing/unstable also fail - in a different way)
Machine: Sony Vaio SZ680
Screen: LCD - LVDS1

When running firefox x starts to fail quite quickly ultimately either the machine stops responding or the windows start to close as the programs crash. Text also starts to appear corrupted.

Xorg logs after failure:
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
(WW) intel(0): i830_uxa_pixmap_swap_bo_with_image: bo map failed
(WW) intel(0): i830_uxa_pixmap_swap_bo_with_image: bo map failed
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_pixmap_swap_bo_with_image: bo map failed
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error


Kernel logs after failure:
May 21 10:10:04 griffin kernel: [  544.524041] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
May 21 10:10:04 griffin kernel: [  544.524058] render error detected, EIR: 0x00000000
May 21 10:10:04 griffin kernel: [  544.524100] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 121124 at 121120)
May 21 10:10:05 griffin kernel: [  545.524074] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
May 21 10:10:05 griffin kernel: [  545.524086] render error detected, EIR: 0x00000000
May 21 10:10:05 griffin kernel: [  545.524138] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 121184 at 121174)
May 21 10:10:06 griffin kernel: [  546.488050] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung

Comment 1 Martin Sillence 2010-05-21 06:41:10 UTC

Created attachment 35781 [details]
xorg log of failure

Comment 2 Julien Cristau 2010-05-21 07:08:03 UTC

On Fri, May 21, 2010 at 06:37:16 -0700, bugzilla-daemon@freedesktop.org wrote:

> lib drm version: libdrm-intel1 2.4.18-5

you should probably upgrade that.

Comment 3 Martin Sillence 2010-05-21 07:36:35 UTC

> > lib drm version: libdrm-intel1 2.4.18-5
> you should probably upgrade that.

OK upgraded to 
libdrm-intel1 2.4.20-2
and kernel: 2.6.34-1-amd64 #1 SMP

Now get:

Kernel:
May 21 15:28:41 griffin kernel: [  302.284025] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
May 21 15:28:41 griffin kernel: [  302.284207] render error detected, EIR: 0x00000000
May 21 15:28:41 griffin kernel: [  302.284243] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 9126 at 9125)
May 21 15:28:43 griffin kernel: [  304.120040] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
May 21 15:28:43 griffin kernel: [  304.120051] render error detected, EIR: 0x00000000
May 21 15:28:43 griffin kernel: [  304.120079] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 9130 at 9129)

Xorg:
(WW) intel(0): i830_uxa_pixmap_swap_bo_with_image: bo map failed
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_pixmap_swap_bo_with_image: bo map failed
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_pixmap_swap_bo_with_image: bo map failed
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_pixmap_swap_bo_with_image: bo map failed
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error

Fatal server error:
Failed to map batchbuffer: Input/output error

Comment 4 Martin Sillence 2010-05-21 07:37:35 UTC

Created attachment 35784 [details]
last batch buffer before GPU hang

As per: http://intellinuxgraphics.org/i915_error_state.html

Comment 5 Chris Wilson 2010-05-21 09:18:50 UTC

Oh, this bug. I've only seen this after removing the MI_FLUSH, but since I can neither explain how the code blows up nor how the random MI_FLUSH prevents the aforementioned explosion, I suspect you've found an instance where it blows up even with the unexplainable flushing. Fixing this crippling i965 bug is definitely on my todo list.

Comment 6 Martin Sillence 2010-05-21 10:01:38 UTC

Created attachment 35786 [details]
Another failure without the crash

Don't know if it's always the same failure or how to check but this is a trace where everything seems to still be working but there are the errors in the kernel log.

Comment 7 Martin Sillence 2010-05-21 14:35:42 UTC

If it helps, it seems reasonably easy to provoke the crash/failure.
Start up firefox with multiple saved tabs 10 or so and wait for them to load. That seems enough to break X. I was also running google's chrome when X died. Takes less than a minute for X to fail.

Comment 8 Martin Sillence 2010-07-18 10:49:04 UTC

Hi, I've been testing the latest package in debian created by Cyril:

> Cyril Brulebois <kibi@debian.org> (12/07/2010):
>> It would be nice to know how it goes with the packages I built (for
>> i386 + amd64) and uploaded there:
>>   http://people.debian.org/~kibi/packages/xserver-xorg-video-intel/
>
> I've put a new version there: 2.12.0-1+ickle2

It looks a lot better already - in my limited testing - no crashes yet and it was so easy to provoke before.
modeset=0 is vital for it to work.

I note x-video isn't working/supported:

$ xvinfo
X-Video Extension version 2.2
screen #0
 no adaptors present

Apart from that a massive improvement, many thanks.

To link them up the Debian bug is: 551387

Hibernate and resume is working.

Thanks again,
M

Comment 9 alium 2010-08-01 22:52:29 UTC

GPU hung for Gen3 graphics (like 945GM) should by fixed in 2.6.35-rc6

With Gen4 hardware (like G45), I found that bug appears with upgrade of mesa 7.8.2 and libdrm 2.4.21. I have no more issue when i rollback to mesa 7.7 and libdrm 2.4.19 (Tested with kernel 2.6.32 and 2.6.34)

Comment 10 Jonas Thiem 2010-08-14 11:43:19 UTC

I have 945GME (Asus Eee 1000H) and apparently run into the very same issue which can be very annoying. It does NOT happen that fast to me though, it only occurs rarely (every two weeks or so) and not always when I'm having a lot of firefox tabs open. I was using intel 2.12.0

Also it was triggered with Opera, so it's not just firefox causing this.

I screenshotted the text corruption:
 http://eloxoph.com/intel2.12corruption1.png
 http://eloxoph.com/intel2.12corruption11.png

Also I got similar Xorg.0.log entries:
 [  9898.021] (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
 [  9898.021] (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
 [  9898.022] (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
 [  9898.022] (WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error

Very interestingly, I also got kernel oopses which fired exactly when the text corruption started:

WARNING: at mm/highmem.c:453 debug_kmap_atomic+0xad/0x12a()
Hardware name: 1000H
Modules linked in: fuse sunrpc cpufreq_ondemand acpi_cpufreq xt_physdev
nf_conntrack_netbios_ns ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 vboxnetadp vboxnetflt vboxdrv uinput snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer
snd iTCO_wdt eeepc_laptop iTCO_vendor_support soundcore rt2860sta
snd_page_alloc uvcvideo sparse_keymap rfkill atl1e videodev v4l1_compat joydev
microcode aes_i586 aes_generic xts gf128mul dm_crypt i915 drm_kms_helper drm
i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.33.6-147.fc13.i686.PAE #1
Call Trace:
[<c043d69d>] warn_slowpath_common+0x65/0x7c
[<c04b12e1>] ? debug_kmap_atomic+0xad/0x12a
[<c043d6c1>] warn_slowpath_null+0xd/0x10
[<c04b12e1>] debug_kmap_atomic+0xad/0x12a
[<c042a91b>] kmap_atomic_prot+0x5c/0x10c
[<c04c9162>] ? __kmalloc+0x103/0x10f
[<c042a9df>] kmap_atomic+0x14/0x16
[<f8008083>] i915_error_object_create+0x9f/0xfa [i915]
[<f80083f2>] i915_handle_error+0x314/0x813 [i915]
[<f8008990>] i915_hangcheck_elapsed+0x9f/0xdf [i915]
[<c0448749>] run_timer_softirq+0x163/0x1e6
[<f80088f1>] ? i915_hangcheck_elapsed+0x0/0xdf [i915]
[<c0442a79>] __do_softirq+0xac/0x152
[<c0442b50>] do_softirq+0x31/0x3c
[<c0442c64>] irq_exit+0x29/0x5c
[<c041d6d7>] smp_apic_timer_interrupt+0x6f/0x7d
[<c078358d>] apic_timer_interrupt+0x31/0x38
[<c045007b>] ? __cancel_work_timer+0x12a/0x15d
[<c05ffe10>] ? acpi_idle_enter_simple+0x10a/0x13d
[<c06d886b>] cpuidle_idle_call+0x6e/0xc3
[<c0407ab8>] cpu_idle+0x91/0xad
[<c077e7a7>] start_secondary+0x1f5/0x233

uname -r: 2.6.33.6-147.2.4.fc13.i686.PAE

glxinfo | grep Mesa:
client glx vendor string: Mesa Project and SGI
OpenGL renderer string: Mesa DRI Intel(R) 945GME GEM 20100328 2010Q1 
OpenGL version string: 1.4 Mesa 7.8.1

Reopening programs does FREE them from text corruption from me! ..
.. BUT if I open blender after any corruption has been shown, X dies instantly with a black screen/freeze (I tested this both times that text corruption oddness has struck me) - after a reboot everything's fine again and also blender runs perfectly.

Reproducing the issue willingly seems close to impossible for me - it really just happens rarely

Comment 11 Martin Sillence 2010-11-01 10:36:46 UTC

Hi, 

I've tried the latest release 2.13 with the latest kernel 2.6.35-trunk (2.6.35-rc6 was too unstable) this combination still results in gpu hung messages and applications crashing.

Is there anything I can do to help? Do more reports help here? Would you like a new bug with the logs or attached to this one?

Would remote access to my laptop help?

Thanks, M

Comment 12 Martin Sillence 2011-02-12 14:46:50 UTC

see Bug 30637
hardware -memory fault

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.