Bug 21480 - 2.6.30-rc2 + xorg-intel-2.7.0 + DRM_I915_KMS = corruption
2.6.30-rc2 + xorg-intel-2.7.0 + DRM_I915_KMS = corruption
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/intel
7.2 (2007.02)
Other Linux (All)
: medium normal
Assigned To: Eric Anholt
Xorg Project Team
http://thread.gmane.org/gmane.comp.fr...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-29 14:44 UTC by Alex Bennee
Modified: 2009-06-09 15:27 UTC (History)
3 users (show)

See Also:


Attachments
Dmesg output from bootup to crash (tilling off) (30.60 KB, text/plain)
2009-04-29 14:44 UTC, Alex Bennee
no flags Details
Xorg Log File (Tiling off) (22.68 KB, text/plain)
2009-04-29 14:44 UTC, Alex Bennee
no flags Details
GDM Display with tiling off (right of display cliped) (586.93 KB, image/jpeg)
2009-04-29 14:45 UTC, Alex Bennee
no flags Details
dmesg once X/GDM restarted with tiling re-enabled (124 bytes, text/plain)
2009-04-29 14:46 UTC, Alex Bennee
no flags Details
GDM display with tilling switched on on obvious corruption on the screen (631.20 KB, image/jpeg)
2009-04-29 14:47 UTC, Alex Bennee
no flags Details
Xorg Log File (Tilling On) (22.64 KB, text/plain)
2009-04-29 14:48 UTC, Alex Bennee
no flags Details
xorg.conf (3.98 KB, text/plain)
2009-04-29 14:49 UTC, Alex Bennee
no flags Details
lscpi -v -v output showing my hardware setup (13.29 KB, text/plain)
2009-04-29 14:56 UTC, Alex Bennee
no flags Details
Latest dmesg output (30.48 KB, application/octet-stream)
2009-04-30 00:38 UTC, Alex Bennee
no flags Details
Xorg Log File (failed start) (1.02 KB, text/plain)
2009-04-30 00:40 UTC, Alex Bennee
no flags Details
Xorg Log File (Latest corruption and crash log) (31.03 KB, text/plain)
2009-04-30 00:41 UTC, Alex Bennee
no flags Details
Xorg Log File (Working system, KMS disabled) (43.88 KB, text/plain)
2009-05-01 00:12 UTC, Alex Bennee
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Bennee 2009-04-29 14:44:04 UTC
Created attachment 25268 [details]
Dmesg output from bootup to crash (tilling off)

The combination of the recent Linus 2.6.30-rc2+ kernels with Kernel Mode
Setting results in screen corruption when GDM starts. Although the
screen is responsive X then crashes when a login is attempted.
Attempts to switch to the console fail.

Further testing with the intel-drm-next tree shows the bug is still
persisting even with the latest kernel:

22:38 alex@danny/x86_64 [linux-2.6-drm-intel.git] >git log --pretty=oneline HEAD^.. | cat
355d7f370b51bbb6f31aaf9f98861057e1e6bbb2 drm/i915: fix up error path leak in i915_cmdbuffer

On suggestion from the mailing list I tried:

Option "Tiling" "false"

Which prevented the vertical corruption on screen (although there was
still incorrect rendering clipping the right hand of the screen).
Again attempting to login resulted in X crashing.

Disabling KMS results in a working system.

I've attached lscpi output, my xorg.conf, failing X logs, dmesg output
and some pictures of the corruption.

Please advise if there is anything else you want me to try.
Comment 1 Alex Bennee 2009-04-29 14:44:45 UTC
Created attachment 25269 [details]
Xorg Log File (Tiling off)
Comment 2 Alex Bennee 2009-04-29 14:45:46 UTC
Created attachment 25270 [details]
GDM Display with tiling off (right of display cliped)
Comment 3 Alex Bennee 2009-04-29 14:46:29 UTC
Created attachment 25271 [details]
dmesg once X/GDM restarted with tiling re-enabled
Comment 4 Alex Bennee 2009-04-29 14:47:26 UTC
Created attachment 25272 [details]
GDM display with tilling switched on on obvious corruption on the screen
Comment 5 Alex Bennee 2009-04-29 14:48:00 UTC
Created attachment 25273 [details]
Xorg Log File (Tilling On)
Comment 6 Alex Bennee 2009-04-29 14:49:07 UTC
Created attachment 25274 [details]
xorg.conf
Comment 7 Alex Bennee 2009-04-29 14:54:04 UTC
My setup is Gentoo, the current relevant packages are:

*  x11-base/xorg-server
      Latest version available: 1.5.3-r5
      Latest version installed: 1.5.3-r5
      Size of files: 5,545 kB
      Homepage:      http://xorg.freedesktop.org/
      Description:   X.Org X servers
      License:       xorg-server MIT

*  x11-base/xorg-x11
      Latest version available: 7.2
      Latest version installed: 7.2
      Size of files: 0 kB
      Homepage:      http://xorg.freedesktop.org
      Description:   An X11 implementation maintained by the X.Org Foundation (meta package)
      License:       as-is

*  x11-base/xorg-x11
      Latest version available: 7.2
      Latest version installed: 7.2
      Size of files: 0 kB
      Homepage:      http://xorg.freedesktop.org
      Description:   An X11 implementation maintained by the X.Org Foundation (meta package)
      License:       as-is

*  x11-libs/libX11
      Latest version available: 1.1.5
      Latest version installed: 1.1.5
      Size of files: 1,547 kB
      Homepage:      http://xorg.freedesktop.org/
      Description:   X.Org X11 library
      License:       libX11

*  x11-libs/libdrm
      Latest version available: 2.4.6
      Latest version installed: 2.4.6
      Size of files: 407 kB
      Homepage:      http://dri.freedesktop.org/
      Description:   X.Org libdrm library
      License:       libdrm

*  x11-drivers/xf86-video-intel
      Latest version available: 2.7.0
      Latest version installed: 2.7.0
      Size of files: 762 kB
      Homepage:      http://xorg.freedesktop.org/
      Description:   X.Org driver for Intel cards
      License:       xf86-video-intel
Comment 8 Alex Bennee 2009-04-29 14:56:32 UTC
Created attachment 25275 [details]
lscpi -v -v output showing my hardware setup

I hope this isn't too rare an Intel chipset. It's only just started giving decent 3D performance again after the A17 patch went in.
Comment 9 Jesse Barnes 2009-04-29 15:30:25 UTC
Your config appears to be failing pretty hard; you don't get the memory layout debug output and for some reason you're falling back to using software rendering.  Can you try removing all the lines from the intel section of your xorg.conf except the "accelmethod" "uxa" one?

The second dmesg is just one line, might be good to get the whole thing.

And a backtrace might help here too, can you attach gdb to X right after you start it so that when it crashes you can do a 'bt'?
Comment 10 Alex Bennee 2009-04-30 00:36:12 UTC
There isn't much in the Intel section of the xorg.conf. If I comment everything but the UXA and driver out it won't start at all.

Here is the backtrace I got once I reset the config and got the usual corruption followed by crash:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f365f9ce6f0 (LWP 7107)]
0x00007f365bd3d118 in ?? () from /usr/lib/libdrm_intel.so.1
(gdb) bt
#0  0x00007f365bd3d118 in ?? () from /usr/lib/libdrm_intel.so.1
#1  0x00007f365bd3d017 in ?? () from /usr/lib/libdrm_intel.so.1
#2  0x00007f365bd3d4ac in ?? () from /usr/lib/libdrm_intel.so.1
#3  0x00007f365bf60bc2 in intel_batch_flush () from /usr/lib64/xorg/modules/drivers//intel_drv.so
#4  0x00007f365bf6ba1e in ?? () from /usr/lib64/xorg/modules/drivers//intel_drv.so
#5  0x00000000005134f9 in ?? ()
#6  0x00000000004ef931 in ?? ()
#7  0x000000000044c08f in BlockHandler ()
#8  0x00000000004de78b in WaitForSomething ()
#9  0x00000000004487fb in Dispatch ()
#10 0x0000000000430735 in main ()
(gdb)

If you want I can set X rebuilding with debug symbols

Comment 11 Alex Bennee 2009-04-30 00:38:58 UTC
Created attachment 25288 [details]
Latest dmesg output

Booted machine, failed to start X, reverted config back to original, restarted, corruption and crash.
Comment 12 Alex Bennee 2009-04-30 00:40:50 UTC
Created attachment 25289 [details]
Xorg Log File (failed start)

This is with the "Intel" section pared back to just the Driver and AccelMethod lines. Didn't start at all, I suspect it needed to know where my monitor was?
Comment 13 Alex Bennee 2009-04-30 00:41:45 UTC
Created attachment 25290 [details]
Xorg Log File (Latest corruption and crash log)
Comment 14 Alex Bennee 2009-04-30 06:55:18 UTC
More complete backtrace:

[Switching to Thread 0x7f1e743bc6f0 (LWP 6585)]
drm_intel_gem_bo_unreference_locked (bo=0x15d6e30) at intel_bufmgr_gem.c:585
585	intel_bufmgr_gem.c: No such file or directory.
	in intel_bufmgr_gem.c
(gdb) bt
#0  drm_intel_gem_bo_unreference_locked (bo=0x15d6e30) at intel_bufmgr_gem.c:585
#1  0x00007f1e7072b017 in drm_intel_gem_bo_unreference_locked (bo=0x1573b90) at intel_bufmgr_gem.c:561
#2  0x00007f1e7072b4ac in drm_intel_gem_bo_unreference (bo=0x1573b90) at intel_bufmgr_gem.c:599
#3  0x00007f1e7094ebc2 in intel_batch_flush (pScrn=0x1279760, flushed=<value optimized out>) at i830_batchbuffer.c:204
#4  0x00007f1e70959a1e in I830BlockHandler (i=0, blockData=0x0, pTimeout=0x7fff7c3edde8, pReadmask=0x7cb9c0) at i830_driver.c:2656
#5  0x00000000005134f9 in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0x7fff7c3edde8, pReadmask=0x7cb9c0) at animcur.c:199
#6  0x00000000004ef931 in compBlockHandler (i=0, blockData=0x0, pTimeout=0x7fff7c3edde8, pReadmask=0x7cb9c0) at compinit.c:163
#7  0x000000000044c08f in BlockHandler (pTimeout=0x7fff7c3edde8, pReadmask=0x7cb9c0) at dixutils.c:383
#8  0x00000000004de78b in WaitForSomething (pClientsReady=0x1574050) at WaitFor.c:223
#9  0x00000000004487fb in Dispatch () at dispatch.c:375
#10 0x0000000000430735 in main (argc=9, argv=0x7fff7c3edfa8, envp=<value optimized out>) at main.c:438
(gdb) frame 0
#0  drm_intel_gem_bo_unreference_locked (bo=0x15d6e30) at intel_bufmgr_gem.c:585
585	in intel_bufmgr_gem.c
(gdb) info locals
bucket = (struct drm_intel_gem_bo_bucket *) 0x12836d0
tiling_mode = 0
bufmgr_gem = <value optimized out>
(gdb) p *$
The history is empty.
(gdb) p *$bucket
Attempt to take contents of a non-pointer value.
(gdb) p *bucket
$1 = {head = {prev = 0x7f1e7094ef90, next = 0x1279760}, max_entries = -1, num_entries = 1}
(gdb) x/5i $pc
0x7f1e7072b118 <drm_intel_gem_bo_unreference_locked+328>:	mov    %rdx,0x8(%rax)
0x7f1e7072b11c <drm_intel_gem_bo_unreference_locked+332>:	mov    %rdx,(%rbx)
0x7f1e7072b11f <drm_intel_gem_bo_unreference_locked+335>:	add    $0x18,%rsp
0x7f1e7072b123 <drm_intel_gem_bo_unreference_locked+339>:	pop    %rbx
0x7f1e7072b124 <drm_intel_gem_bo_unreference_locked+340>:	pop    %rbp
(gdb) p/x $rdx
$2 = 0x15d6eb0
(gdb) p/x $rax
$3 = 0x7f1e7094ef90
(gdb) source /tmp/
.ICE-unix/      .X0-lock        .X11-unix/      keyring-Fddiyp/ orbit-alex/     ssh-FczORk6606/ 
(gdb) directory ~alex/src/xorg
xorg/  xorg2/ 
(gdb) directory ~alex/src/xorg/
app/                               font/                              more_debug.log                     working.log
build.log                          git_xorg.sh                        old/                               working_vs_nonworking.txt
buildenv.sh                        hang_and_glitch.log                piglit/                            working_vs_nonworking_unified.txt
buildit.sh                         hw.txt                             pixman/                            xcb/
core                               install/                           proto/                             xkeyboard-config/
data/                              kernel_oops.txt                    report.txt                         xorg.conf
doc/                               khang                              status.sh                          xserver/
driver/                            lib/                               status2.sh                         
drm/                               mesa/                              util/                              
(gdb) directory ~alex/src/gentoo/
portage/     portage.git/ tmp/         
(gdb) directory ~alex/src/gentoo/tmp/portage/
app-emulation/ dev-libs/      media-sound/   net-im/        net-libs/      net-wireless/  x11-libs/      x11-misc/      
(gdb) directory ~alex/src/gentoo/tmp/portage/x11-libs/libdrm-2.4.6/
.compiled     .exit_status  .unpacked     build-info/   distdir/      homedir/      temp/         work/         
(gdb) directory ~alex/src/gentoo/tmp/portage/x11-libs/libdrm-2.4.6/work/libdrm-2.4.6/
.elibtoolized       README              config.log          configure.ac        libdrm.pc           libtool             tests/              
Makefile            aclocal.m4          config.status       depcomp             libdrm.pc.in        ltmain.sh           
Makefile.am         autom4te.cache/     config.sub          install-sh          libdrm_intel.pc     missing             
Makefile.in         config.guess        configure           libdrm/             libdrm_intel.pc.in  shared-core/        
(gdb) directory ~alex/src/gentoo/tmp/portage/x11-libs/libdrm-2.4.6/work/libdrm-2.4.6/libdrm
libdrm/             libdrm.pc           libdrm.pc.in        libdrm_intel.pc     libdrm_intel.pc.in  
(gdb) directory ~alex/src/gentoo/tmp/portage/x11-libs/libdrm-2.4.6/work/libdrm-2.4.6/libdrm
Source directories searched: /home/alex/src/gentoo/tmp/portage/x11-libs/libdrm-2.4.6/work/libdrm-2.4.6/libdrm:$cdir:$cwd
(gdb) bt
#0  drm_intel_gem_bo_unreference_locked (bo=0x15d6e30) at intel_bufmgr_gem.c:585
#1  0x00007f1e7072b017 in drm_intel_gem_bo_unreference_locked (bo=0x1573b90) at intel_bufmgr_gem.c:561
#2  0x00007f1e7072b4ac in drm_intel_gem_bo_unreference (bo=0x1573b90) at intel_bufmgr_gem.c:599
#3  0x00007f1e7094ebc2 in intel_batch_flush (pScrn=0x1279760, flushed=<value optimized out>) at i830_batchbuffer.c:204
#4  0x00007f1e70959a1e in I830BlockHandler (i=0, blockData=0x0, pTimeout=0x7fff7c3edde8, pReadmask=0x7cb9c0) at i830_driver.c:2656
#5  0x00000000005134f9 in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0x7fff7c3edde8, pReadmask=0x7cb9c0) at animcur.c:199
#6  0x00000000004ef931 in compBlockHandler (i=0, blockData=0x0, pTimeout=0x7fff7c3edde8, pReadmask=0x7cb9c0) at compinit.c:163
#7  0x000000000044c08f in BlockHandler (pTimeout=0x7fff7c3edde8, pReadmask=0x7cb9c0) at dixutils.c:383
#8  0x00000000004de78b in WaitForSomething (pClientsReady=0x1574050) at WaitFor.c:223
#9  0x00000000004487fb in Dispatch () at dispatch.c:375
#10 0x0000000000430735 in main (argc=9, argv=0x7fff7c3edfa8, envp=<value optimized out>) at main.c:438
(gdb) frame 0
#0  drm_intel_gem_bo_unreference_locked (bo=0x15d6e30) at intel_bufmgr_gem.c:585
585		    DRMLISTADDTAIL(&bo_gem->head, &bucket->head);
(gdb) l
580		    bo_gem->validate_index = -1;
581		    bo_gem->relocs = NULL;
582		    bo_gem->reloc_target_bo = NULL;
583		    bo_gem->reloc_count = 0;
584	
585		    DRMLISTADDTAIL(&bo_gem->head, &bucket->head);
586		    bucket->num_entries++;
587		} else {
588		    drm_intel_gem_bo_free(bo);
589		}
(gdb) 

Comment 15 Alex Bennee 2009-04-30 08:16:42 UTC
Ok a little unwrapping of macros later and I get:

Program received signal SIGSEGV, Segmentation fault.
0x00007fc30a950921 in drm_intel_gem_bo_unreference_locked (bo=0x2862eb0) at intel_bufmgr_gem.c:591
591                     (&bucket->head)->prev->next = (&bo_gem->head);
(gdb) x/5i $pc
0x7fc30a950921 <drm_intel_gem_bo_unreference_locked+446>:       mov    %rax,0x8(%rdx)
0x7fc30a950925 <drm_intel_gem_bo_unreference_locked+450>:       mov    -0x18(%rbp),%rdx
0x7fc30a950929 <drm_intel_gem_bo_unreference_locked+454>:       sub    $0xffffffffffffff80,%rdx
0x7fc30a95092d <drm_intel_gem_bo_unreference_locked+458>:       mov    -0x10(%rbp),%rax
0x7fc30a950931 <drm_intel_gem_bo_unreference_locked+462>:       mov    %rdx,(%rax)
(gdb) p &bucket->head
$1 = (drmMMListHead *) 0x24fb6d0
(gdb) p *$
$2 = {prev = 0x7fc30ab75f90, next = 0x24f1760}
(gdb) p *$->prev
$3 = {prev = 0x53fd8948d0894855, next = 0x289a8b4808ec8348}
(gdb) p *$->prev->next
Cannot access memory at address 0x53fd8948d089485d

I'm a little confused myself to the whole ->prev->next thing, surely (assuming prev wasn't broken) it would always point back to &bucket->head?
Comment 16 Alex Bennee 2009-04-30 08:46:01 UTC
Reassigning to anholt following IRC conversation with jbarnes as the crash looks very libdrm related.
Comment 17 Alex Bennee 2009-05-01 00:12:43 UTC
Created attachment 25328 [details]
Xorg Log File (Working system, KMS disabled)

Following discussion on IRC and looking at a working setup (i.e.
disable KMS) the following seems to be the case.

1. With KMS enabled the intel driver doesn't get any memory
allocation.
2. This should probably be a fatal error rather than just
informational
3. In this state of affairs libdrm can see a corrupt list of memory
and crash

However 2 & 3 are probably symptoms of 1.

Let me know if there is any more information I can provide.
Comment 18 Alex Bennee 2009-05-24 13:34:20 UTC
This behaviour seems to have progressed since on the latest intel-drm-next. I can now get a working login. However video playback locks the X server and mode switching doesn't seem to work. dmesg shows:

0.272044] Call Trace:
0.272053]  [<ffffffff804bfce6>] ? thread_return+0x3e/0xaa
0.272058]  [<ffffffff804c0589>] __mutex_lock_slowpath+0xdf/0x129
0.272062]  [<ffffffff804c092d>] mutex_lock+0x23/0x3b
0.272067]  [<ffffffff803cecd3>] i915_gem_retire_work_handler+0x2d/0x6b
0.272073]  [<ffffffff80247666>] worker_thread+0x176/0x20f
0.272078]  [<ffffffff803ceca6>] ? i915_gem_retire_work_handler+0x0/0x6b
0.272083]  [<ffffffff8024b463>] ? autoremove_wake_function+0x0/0x3d
0.272087]  [<ffffffff802474f0>] ? worker_thread+0x0/0x20f
0.272091]  [<ffffffff802474f0>] ? worker_thread+0x0/0x20f
0.272095]  [<ffffffff8024b061>] kthread+0x5b/0x88
0.272101]  [<ffffffff8020c0ba>] child_rip+0xa/0x20
0.272105]  [<ffffffff8024b006>] ? kthread+0x0/0x88
0.272109]  [<ffffffff8020c0b0>] ? child_rip+0x0/0x20

Should I close this bug and raise a new one?
Comment 19 Gordon Jin 2009-06-09 15:27:41 UTC
(In reply to comment #18)
> This behaviour seems to have progressed since on the latest intel-drm-next. I
> can now get a working login. However video playback locks the X server and mode
> switching doesn't seem to work. dmesg shows:
> 0.272044] Call Trace:
> 0.272053]  [<ffffffff804bfce6>] ? thread_return+0x3e/0xaa
> 0.272058]  [<ffffffff804c0589>] __mutex_lock_slowpath+0xdf/0x129
> 0.272062]  [<ffffffff804c092d>] mutex_lock+0x23/0x3b
> 0.272067]  [<ffffffff803cecd3>] i915_gem_retire_work_handler+0x2d/0x6b
> 0.272073]  [<ffffffff80247666>] worker_thread+0x176/0x20f
> 0.272078]  [<ffffffff803ceca6>] ? i915_gem_retire_work_handler+0x0/0x6b
> 0.272083]  [<ffffffff8024b463>] ? autoremove_wake_function+0x0/0x3d
> 0.272087]  [<ffffffff802474f0>] ? worker_thread+0x0/0x20f
> 0.272091]  [<ffffffff802474f0>] ? worker_thread+0x0/0x20f
> 0.272095]  [<ffffffff8024b061>] kthread+0x5b/0x88
> 0.272101]  [<ffffffff8020c0ba>] child_rip+0xa/0x20
> 0.272105]  [<ffffffff8024b006>] ? kthread+0x0/0x88
> 0.272109]  [<ffffffff8020c0b0>] ? child_rip+0x0/0x20
> Should I close this bug and raise a new one?

yes, I think so. And you can refer to http://intellinuxgraphics.org/how_to_report_bug.html when filing bugs. Thanks.