Bug 23069 - frequent hangs with and without kms using intel 2.8.0 and kernel 2.6.31-rc4 on samsung nc 10
Summary: frequent hangs with and without kms using intel 2.8.0 and kernel 2.6.31-rc4 o...
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: 7.4 (2008.09)
Hardware: x86 (IA32) Linux (All)
: high blocker
Assignee: Wang Zhenyu
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-01 01:50 UTC by Soeren Sonnenburg
Modified: 2009-08-28 04:50 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg (453.29 KB, text/plain)
2009-08-01 01:50 UTC, Soeren Sonnenburg
no flags Details
acpidump after boot (108.53 KB, application/octet-stream)
2009-08-01 01:51 UTC, Soeren Sonnenburg
no flags Details
kernel config (16.80 KB, application/x-gzip)
2009-08-01 01:52 UTC, Soeren Sonnenburg
no flags Details
gpu dump after hang occuring after resume from s2ram with nomodeset=1 (76.54 KB, application/bzip2)
2009-08-03 10:24 UTC, Soeren Sonnenburg
no flags Details
intel_gpu_dump after hang with kms (79.78 KB, application/bzip2)
2009-08-12 07:39 UTC, Soeren Sonnenburg
no flags Details
dmesg after hang with kms (16.00 KB, application/bzip2)
2009-08-12 07:41 UTC, Soeren Sonnenburg
no flags Details

Description Soeren Sonnenburg 2009-08-01 01:50:28 UTC
Created attachment 28245 [details]
dmesg

I have hangs where I still can move the mouse pointer (but display is frozen). I have hangs where both mouse and display freeze (particularly after resuming from s2ram). In all cases I can however do alt+sysrq+s and see the harddisk light flashing even though alt+sysrq+b does not successfully lead to a reboot...

It seems that this happens more frequently when kms is enabled, and less
without it (passing nomodeset=1 ).

Needless to say things were rock-stable with EXA before (month of uptime after hundreds of s2ram cycles).

If you need any output of anything please say so.

(this is on a debian-sid machine).
Comment 1 Soeren Sonnenburg 2009-08-01 01:51:00 UTC
Created attachment 28246 [details]
acpidump after boot
Comment 2 Soeren Sonnenburg 2009-08-01 01:52:21 UTC
Created attachment 28247 [details]
kernel config
Comment 3 Soeren Sonnenburg 2009-08-01 01:56:12 UTC
debian versions of xorg/intel driver and compiz (the running wm)

# dpkg -l | egrep -e '(xorg|mesa|compiz)' | awk '{print $2 " " $3}'
compiz 0.8.2-6
compiz-core 0.8.2-6
compiz-dev 0.8.2-6
compiz-fusion-plugins-extra 0.8.2-3
compiz-fusion-plugins-main 0.8.2-3
compiz-gnome 0.8.2-6
compiz-gtk 0.8.2-6
compiz-plugins 0.8.2-6
compizconfig-backend-gconf 0.8.2-1
compizconfig-settings-manager 0.8.2-2
libcompizconfig-dev 0.8.2-2
libcompizconfig0 0.8.2-2
libgl1-mesa-dev 7.5-3
libgl1-mesa-dri 7.5-3
libgl1-mesa-dri-dbg 7.5-3
libgl1-mesa-glx 7.5-3
libgl1-mesa-glx-dbg 7.5-3
libglu1-mesa 7.5-3
libglu1-mesa-dev 7.5-3
libglu1-xorg 1:7.4+3
libglu1-xorg-dev 1:7.4+3
mesa-common-dev 7.5-3
mesa-swx11-source 7.0.3-7
mesa-utils 7.5-3
mesademos 6.2.1-2
python-compizconfig 0.8.2-1
xorg 1:7.4+3
xorg-docs 1:1.4-5
xorg-docs-core 1:1.4-5
xserver-xorg 1:7.4+3
xserver-xorg-core 2:1.6.2.901-1
xserver-xorg-dev 2:1.6.2.901-1
xserver-xorg-input-evdev 1:2.2.3-1
xserver-xorg-input-kbd 1:1.3.2-3
xserver-xorg-input-mouse 1:1.4.0-2
xserver-xorg-input-synaptics 1.1.2-1
xserver-xorg-video-intel 2:2.8.0-1
xserver-xorg-video-intel-dbg 2:2.8.0-1
xserver-xorg-video-r128 6.8.1-1
xserver-xorg-video-radeon 1:6.12.2-3
xserver-xorg-video-vesa 1:2.2.0-1
Comment 4 Gordon Jin 2009-08-01 02:34:05 UTC
Could you provide the intel_gpu_dump output after the hang: http://intellinuxgraphics.org/intel-gpu-dump.html
Comment 5 Soeren Sonnenburg 2009-08-03 10:24:00 UTC
Created attachment 28311 [details]
gpu dump after hang occuring after resume from s2ram with nomodeset=1

Hmmhh, I am confused: While grabbing the dump and filling in the elements the machine suddenly (after 20 minutes) came back to life. Before this neither cursor nor display (nor ctrl+alt+f1) were functional.
Comment 6 Wang Zhenyu 2009-08-09 20:35:47 UTC
So in KMS, does this only happen after suspend-to-ram?
Comment 7 Wang Zhenyu 2009-08-09 20:40:42 UTC
Could you test with current linus's linux-2.6 git tip? thanks.
Comment 8 Soeren Sonnenburg 2009-08-09 21:27:45 UTC
I just recognized that the particular hang after resume is simply a gnome-screensaver bug. So that part of the bug report is a false alert. However, as I am traveling currently I cannot easily grab a gpudump when the mouse cursor is still moving but graphics freezes. Nevertheless, I will try with updated intel-git / linux-git over this week an report back.
Comment 9 Soeren Sonnenburg 2009-08-11 22:17:35 UTC
The hang with kms is still there even with intel-git, kernel-2.6.31-rc5-git. It happens once per day. Symptoms: X display freezes (including mouse cursor) but I can still do alt+sysrq+{s,b}. It will unfortunately take a while to get the gpu dump.
Comment 10 Soeren Sonnenburg 2009-08-12 00:12:15 UTC
I've noticed that after rebooting (just after the 'hang') I have this in dmesg. Might not be relevant though... as it could be a follow-up problem (and note that despite that the machine works perfectly OK)

Linux agpgart interface v0.103
agpgart-intel 0000:00:00.0: Intel 945GME Chipset
agpgart-intel 0000:00:00.0: detected 7932K stolen memory
agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000
[drm] Initialized drm 1.1.0 20060810
i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
i915 0000:00:02.0: setting latency timer to 64
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0
fbcon: inteldrmfb (fb0) is primary device
[drm] DAC-6: set mode 1920x1200 17
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 1, comm: swapper Not tainted 2.6.31-rc5-sonne #8
Call Trace:
 [<c146564e>] ? printk+0x18/0x1a
 [<c10755e7>] __report_bad_irq+0x27/0x90
 [<c107444c>] ? handle_IRQ_event+0x7c/0x1a0
 [<c10757a6>] note_interrupt+0x156/0x1a0
 [<c1075e8c>] handle_fasteoi_irq+0xac/0xd0
 [<c10058ea>] handle_irq+0x1a/0x20
 [<c1004e96>] do_IRQ+0x46/0xc0
 [<c104457f>] ? irq_exit+0x2f/0x90
 [<c1019629>] ? smp_apic_timer_interrupt+0x59/0x90
 [<c1003529>] common_interrupt+0x29/0x30
 [<c12c00d8>] ? drm_core_ioremapfree+0x38/0x60
 [<c12ff3a9>] ? intel_lvds_set_power+0xf9/0x110
 [<c12ff662>] intel_lvds_prepare+0x42/0x50
 [<c12c9898>] drm_crtc_helper_set_mode+0x3e8/0x470
 [<c12c9fa1>] drm_crtc_helper_set_config+0x5d1/0x700
 [<c1249cff>] ? soft_cursor+0x7f/0x1e0
 [<c1306087>] intelfb_pan_display+0xd7/0x100
 [<c1305fb0>] ? intelfb_pan_display+0x0/0x100
 [<c1239479>] fb_pan_display+0xe9/0x130
 [<c1248efd>] bit_update_start+0x1d/0x40
 [<c1245810>] fbcon_switch+0x350/0x480
 [<c12accb8>] redraw_screen+0x118/0x1f0
 [<c12af714>] ? vc_do_resize+0x2a4/0x410
 [<c12af82f>] vc_do_resize+0x3bf/0x410
 [<c12af8eb>] vc_resize+0x1b/0x20
 [<c1247b85>] fbcon_init+0x2b5/0x480
 [<c12aae21>] visual_init+0xa1/0xf0
 [<c12af28a>] take_over_console+0x23a/0x420
 [<c1247dac>] fbcon_takeover+0x5c/0xb0
 [<c1248a10>] fbcon_event_notify+0x7f0/0x8d0
 [<c1004e9f>] ? do_IRQ+0x4f/0xc0
 [<c107444c>] ? handle_IRQ_event+0x7c/0x1a0
 [<c12eae1b>] ? i915_driver_irq_handler+0x2ab/0xcc0
 [<c107444c>] ? handle_IRQ_event+0x7c/0x1a0
 [<c101c04e>] ? ack_apic_level+0x7e/0x290
 [<c1075e71>] ? handle_fasteoi_irq+0x91/0xd0
 [<c104457f>] ? irq_exit+0x2f/0x90
 [<c1004e9f>] ? do_IRQ+0x4f/0xc0
 [<c107444c>] ? handle_IRQ_event+0x7c/0x1a0
 [<c101c04e>] ? ack_apic_level+0x7e/0x290
 [<c1003529>] ? common_interrupt+0x29/0x30
 [<c10400d8>] ? wait_noreap_copyout+0xa8/0xe0
 [<c1248224>] ? fbcon_event_notify+0x4/0x8d0
 [<c105a7ed>] notifier_call_chain+0x2d/0x70
 [<c105ab74>] __blocking_notifier_call_chain+0x44/0x60
 [<c105abaa>] blocking_notifier_call_chain+0x1a/0x20
 [<c1239051>] fb_notifier_call_chain+0x11/0x20
 [<c123a12e>] register_framebuffer+0x1fe/0x350
 [<c12cafc8>] ? drm_mode_duplicate+0x18/0x60
 [<c130717e>] intelfb_probe+0x60e/0x720
 [<c12ca438>] drm_helper_initial_config+0x38/0x1b0
 [<c12e8561>] i915_driver_load+0x1011/0x1110
 [<c12c132f>] drm_get_dev+0x2ff/0x4c0
 [<c111406f>] ? sysfs_addrm_start+0x3f/0xa0
 [<c145b4a8>] i915_pci_probe+0xd/0x15
 [<c123033e>] local_pci_probe+0xe/0x10
 [<c12311a0>] pci_device_probe+0x60/0x80
 [<c130fb75>] driver_probe_device+0x75/0x190
 [<c130fd19>] __driver_attach+0x89/0xa0
 [<c130f40b>] bus_for_each_dev+0x5b/0x80
 [<c130fa19>] driver_attach+0x19/0x20
 [<c130fc90>] ? __driver_attach+0x0/0xa0
 [<c130ed87>] bus_add_driver+0x247/0x300
 [<c12303f0>] ? pci_device_shutdown+0x0/0x30
 [<c12310e0>] ? pci_device_remove+0x0/0x40
 [<c130ffb5>] driver_register+0x75/0x170
 [<c12315d0>] __pci_register_driver+0x40/0xb0
 [<c12bcbb1>] drm_init+0xf1/0x100
 [<c16b04f6>] ? i915_init+0x0/0x48
 [<c16b053c>] i915_init+0x46/0x48
 [<c100112a>] do_one_initcall+0x2a/0x150
 [<c1109525>] ? create_proc_entry+0x55/0xa0
 [<c10769a5>] ? register_irq_proc+0xa5/0xc0
 [<c1076a25>] ? init_irq_proc+0x65/0x80
 [<c168b32d>] kernel_init+0x13a/0x191
 [<c168b1f3>] ? kernel_init+0x0/0x191
 [<c10039cf>] kernel_thread_helper+0x7/0x18
handlers:
[<c12eab70>] (i915_driver_irq_handler+0x0/0xcc0)
Disabling IRQ #16
[drm] LVDS-8: set mode 1024x600 18
Console: switching to colour frame buffer device 128x37
[drm] fb0: inteldrmfb frame buffer device
[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Comment 11 Wang Zhenyu 2009-08-12 00:22:59 UTC
Interesting, so looks we got an irq that is not handled now, which makes kernel disable it, and then we won't get future irq anymore? Could you attach intel_gpu_dump output at that time?
Comment 12 Soeren Sonnenburg 2009-08-12 07:39:56 UTC
Created attachment 28549 [details]
intel_gpu_dump after hang with kms
Comment 13 Soeren Sonnenburg 2009-08-12 07:41:41 UTC
Created attachment 28550 [details]
dmesg after hang with kms

Look at kms_dmesg_after_hang.txt.bz2, there is a

intel_gpu_dump: page allocation failure. order:9, mode:0x40d0

...
Comment 14 Wang Zhenyu 2009-08-12 23:00:56 UTC
I managed to get one nc10, could you describe how you trigger this? and what's your desktop environment and apps?
Comment 15 Soeren Sonnenburg 2009-08-12 23:47:54 UTC
(In reply to comment #14)
> I managed to get one nc10, could you describe how you trigger this? and what's
> your desktop environment and apps?

I am afraid it is not that easy to trigger (at least I don't know of a way to trigger the problem) :( Anyway, I am using gnome + compiz on debian-sid with the kernel and intel xorg driver being manually compiled from git. Then I simply work about one day (from home and from work) using the nc10. This includes a couple of s2ram's and me connecting the nc10 to a 24" dell display, disabling the internal one and still using compiz and 3d-screensavers...

Currently I got all the hangs in the middle of doing work (in the shell being remotely logged in and showing things to someone). And well it basically happens everyday at about 4pm ...   
Comment 16 Soeren Sonnenburg 2009-08-17 00:44:30 UTC
Today I have a record uptime of 4 days (without using an external display!). I have no idea if this bug is might only occur when an external display is connected? I will once again upgrade to git-current and see what I get... It might be unrelated but I've seen 3 erros in dmesg related to gem:

[drm] LVDS-8: set mode  25
[drm:i915_gem_object_bind_to_gtt] *ERROR* GTT full, but LRU list empty
[drm:i915_gem_object_pin] *ERROR* Failure to bind: -28
[drm:i915_gem_evict_something] *ERROR* inactive empty 1 request empty 1 flushing empty 1

More context:

$ dmesg | egrep -e '(intel|drm|i915)'
i915 0000:00:02.0: PCI INT A disabled
i915 0000:00:02.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
i915 0000:00:02.0: setting latency timer to 64
[drm] DAC-6: set mode 1920x1200 17
[drm] LVDS-8: set mode  25
[drm:i915_gem_object_bind_to_gtt] *ERROR* GTT full, but LRU list empty
[drm:i915_gem_object_pin] *ERROR* Failure to bind: -28
[drm:i915_gem_evict_something] *ERROR* inactive empty 1 request empty 1 flushing empty 1
[drm] LVDS-8: set mode  2d
[drm] DAC-6: set mode 1920x1200 17
i915 0000:00:02.0: PCI INT A disabled
i915 0000:00:02.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
i915 0000:00:02.0: setting latency timer to 64
[drm] DAC-6: set mode 1920x1200 17
[drm] LVDS-8: set mode  2d
i915 0000:00:02.0: PCI INT A disabled
i915 0000:00:02.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
i915 0000:00:02.0: setting latency timer to 64
[drm] DAC-6: set mode 1920x1200 17
[drm] LVDS-8: set mode  2d
Comment 17 Soeren Sonnenburg 2009-08-28 04:50:45 UTC
I haven't seen any hangs since I am using 2.6.31-rc7 + recent intel git (I guess the newer 2.8.1 contains all the fixes too). So for me this issue is settled and I am closing the bug. Thanks Zhenyu and the other intel devs for your work.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.