Using Xorg 7.2 with xorg-server 1.3 (also 1.2) and Nouveau DRM and DDX from git May 12th 2007, starting sylpheed-claws leads to a deterministic segfault with the following backtrace: #0 0x00002b7f8689c237 in NVUploadToScreen (pDst=0xb782b0, x=0, y=0, w=16, h=5, src=0x2b7f8e293440 "žžÌÿžžÌÿžžÌÿ\005", src_pitch=6144) at nv_exa.c:351 #1 0x00002b7f883b94b1 in exaPutImage (pDrawable=0xb782b0, pGC=0x7c8300, depth=<value optimized out>, x=-784, y=-48, w=<value optimized out>, h=64, leftPad=0, format=2, bits=0x2b7f8e23a000 "") at exa_accel.c:206 #2 0x0000000000522fdb in damagePutImage (pDrawable=0xb782b0, pGC=0x7c8300, depth=32, x=-784, y=-48, w=1536, h=64, leftPad=0, format=2, pImage=0x2b7f8e23a000 "") at damage.c:786 #3 0x00000000004f7232 in miShmPutImage (dst=0xb77e00, pGC=0xb28030, depth=32, format=2, w=1536, h=64, sx=-784, sy=48, sw=16, sh=16, dx=0, dy=0, data=0x2b7f8e23a000 "") at shm.c:520 #4 0x00000000004f83af in ProcShmPutImage (client=0x9bfda0) at shm.c:881 #5 0x00000000004f91ac in ProcShmDispatch (client=0x2b7f8c4c7bc0) at shm.c:1114 #6 0x0000000000449ffa in Dispatch () at dispatch.c:457 #7 0x000000000043320b in main (argc=7, argv=0x7fff2585fb88, envp=<value optimized out>) at main.c:445 The offending line nv_exa.c:351: memcpy(pNv->AGPScratch->map, src, nlines*src_pitch); Apparently this memcpy is inlined as: 0x00002b7f8689c237 <NVUploadToScreen+295>: rep movsb %ds:(%rsi),%es:(%rdi) %ds = 0x0 %es = 0x0 %rsi = 0x2b7f8e29a000 %rdi = 0x2b7f8c4c7bc0 src = 0x2b7f8e293440 pNv->AGPScratch->map = 0x2b7f8c4c1000 Computing from these, SEGV triggers on the 5th line at byte 3008 (or byte 27584 in total). I have not verified these numbers are the same every time, but the backtrace is the same. This bug appeared when I updated from Xorg 7.1 to Xorg 7.2. Using Option "EXANoUploadToScreen" "true" in Device section does circumvent this, but the desktop becomes very sluggish, and then I can hit another bug, which does not seem Nouveau related.
Finally I had some time to do poking, but somehow I could not get gdb to understand nouveau_drv.so anymore. Any hints? Anyway, I found that the following patch for the DDX seems to eliminate this segmentation fault of the X server. --- a/src/nv_exa.c +++ b/src/nv_exa.c @@ -345,10 +345,14 @@ static Bool NVUploadToScreen(PixmapPtr pDst, while (h > 0) { NVDEBUG(" max_lines=%d, h=%d\n", max_lines, h); int nlines = h > max_lines ? max_lines : h; + int nadj = src_pitch - line_length; /* reset the notification object */ memset(pNv->Notifier0->map, 0xff, pNv->Notifier0->size); - memcpy(pNv->AGPScratch->map, src, nlines*src_pitch); + if(h - nlines > 0) + memcpy(pNv->AGPScratch->map, src, nlines*src_pitch); + else + memcpy(pNv->AGPScratch->map, src, nlines*src_pitch - + nadj); NVDmaStart(pNv, NvSubMemFormat, MEMFORMAT_NOTIFY, 1); NVDmaNext (pNv, 0); I do not propose this as the fix, because I do not really know what NVUploadToScreen should do. This patch prevents copying the stride padding of the last pixel row. I hope someone knows what to do with this. With this patch applied, sylpheed-claws starts all the way and can even run for a while, but sooner than later it will hit either SIGFPE in NVUploadToScreen, or SIGABORT probably due to illegal free(), like this: (gdb) bt #0 0x00002ac5f9842885 in raise () from /lib/libc.so.6 #1 0x00002ac5f9843b3e in abort () from /lib/libc.so.6 #2 0x00002ac5f9878a27 in __libc_message () from /lib/libc.so.6 #3 0x00002ac5f987db1d in malloc_printerr () from /lib/libc.so.6 #4 0x00002ac5f987f146 in free () from /lib/libc.so.6 #5 0x000000000061161e in Xfree (ptr=0xc9cc50) at utils.c:1470 #6 0x00002ac5fbfa612c in fbDestroyPixmap (pPixmap=0xc9cc50) at fbpixmap.c:105 #7 0x00002ac5fc0cb7f3 in exaDestroyPixmap (pPixmap=0xc9cc50) at exa.c:218 #8 0x00000000005a3dc2 in damageDestroyPixmap (pPixmap=0xc9cc50) at damage.c:1628 #9 0x00002ac5f9fc73b1 in XvDestroyPixmap (pPix=0xc9cc50) at xvmain.c:393 #10 0x000000000044dd70 in dixDestroyPixmap (value=0xc9cc50, pid=16789834) at dispatch.c:1466 #11 0x00000000004356e7 in FreeResource (id=16789834, skipDeleteFuncType=0) at resource.c:536 #12 0x000000000044e03a in ProcFreePixmap (client=0x9c5260) at dispatch.c:1540 #13 0x000000000056d0d0 in XaceCatchDispatchProc (client=0x9c5260) at xace.c:281 #14 0x000000000044b5f7 in Dispatch () at dispatch.c:457 #15 0x0000000000432dcc in main (argc=9, argv=0x7fffb1bafa18, envp=0x7fffb1bafa68) at main.c:445 More over, some of the fonts are garbled, until the GUI widget is redrawn. I did not attach a screen capture as these are likely a different problem than this bug.
I get the same SIGABRT (note, we're no longer talking about the segfault) backtrace with a standard Fedora 7 install, straight after typing my username into the gdm username field and hitting return: Program received signal SIGABRT, Aborted. 0x006ed402 in __kernel_vsyscall () (gdb) bt #0 0x006ed402 in __kernel_vsyscall () #1 0x00428fa0 in raise () from /lib/libc.so.6 #2 0x0042a8b1 in abort () from /lib/libc.so.6 #3 0x0045febb in __libc_message () from /lib/libc.so.6 #4 0x00467f41 in _int_free () from /lib/libc.so.6 #5 0x0046b580 in free () from /lib/libc.so.6 #6 0x081bb591 in Xfree (ptr=0x0) at utils.c:1470 #7 0x00fd5a4a in fbDestroyPixmap (pPixmap=0x6) at fbpixmap.c:105 #8 0x001f066b in exaDestroyPixmap (pPixmap=0x8d2c760) at exa.c:218 #9 0x0816f054 in damageDestroyPixmap (pPixmap=0x8d2c760) at damage.c:1628 #10 0x006164e0 in XvDestroyPixmap (pPix=0x8d2c760) at xvmain.c:393 #11 0x080825d5 in dixDestroyPixmap (value=0x8d2c760, pid=4194652) at dispatch.c:1466 #12 0x08072bed in FreeResource (id=4194652, skipDeleteFuncType=0) at resource.c:536 #13 0x080841d3 in ProcFreePixmap (client=0x8cfcaf0) at dispatch.c:1540 #14 0x0814efd1 in XaceCatchDispatchProc (client=0x8cfcaf0) at xace.c:281 #15 0x0808936a in Dispatch () at dispatch.c:457 #16 0x080710a5 in main (argc=10, argv=0xbfa84d24, envp=Cannot access memory at address 0xf73 ) at main.c:445 (gdb) f 6 #6 0x081bb591 in Xfree (ptr=0x0) at utils.c:1470 1470 free((char *)ptr); (gdb) f 7 #7 0x00fd5a4a in fbDestroyPixmap (pPixmap=0x6) at fbpixmap.c:105 105 xfree(pPixmap); (gdb) l 100 Bool 101 fbDestroyPixmap (PixmapPtr pPixmap) 102 { 103 if(--pPixmap->refcnt) 104 return TRUE; 105 xfree(pPixmap); 106 return TRUE; 107 } 108 109 #define ADDRECT(reg,r,fr,rx1,ry1,rx2,ry2) \
My backtrace isn't quite the same as pq's. His: #5 0x000000000061161e in Xfree (ptr=0xc9cc50) at utils.c:1470 #6 0x00002ac5fbfa612c in fbDestroyPixmap (pPixmap=0xc9cc50) at fbpixmap.c:105 #7 0x00002ac5fc0cb7f3 in exaDestroyPixmap (pPixmap=0xc9cc50) at exa.c:218 Mine: #6 0x081bb591 in Xfree (ptr=0x0) at utils.c:1470 #7 0x00fd5a4a in fbDestroyPixmap (pPixmap=0x6) at fbpixmap.c:105 #8 0x001f066b in exaDestroyPixmap (pPixmap=0x8d2c760) at exa.c:218 I can't see where the pointer's being lost. I added debugging to exaDestroyPixmap(), printing pPixmap, and I get a valid pointer right up to the return line: 195 return fbDestroyPixmap (pPixmap); (WW) in exaDestroyPixmap, pPixmap = 8648e30 (WW) after ExaPixmapPriv, pPixmap = 8648e30 -- 0x0x400122 (0x500000) (398x50) Free 0x13880 -> 0x500000 (0x500000) done freeing (WW) after exaOffscreenFree, pPixmap = 8648e30 (WW) after REGION_UNINIT, pPixmap = 8648e30 (WW) returning, pPixmap = 8648e30 Yet when we enter fbDestroyPixmap, we have pPixmap=0x6?
Adding LogMessage()s with the pPixmap address all the way to free() seems to make the pointer stick around all the way there. I wonder what's up with that. New backtrace: Program received signal SIGABRT, Aborted. 0x00bd7402 in __kernel_vsyscall () (gdb) bt #0 0x00bd7402 in __kernel_vsyscall () #1 0x00428fa0 in raise () from /lib/libc.so.6 #2 0x0042a8b1 in abort () from /lib/libc.so.6 #3 0x0045febb in __libc_message () from /lib/libc.so.6 #4 0x00467f41 in _int_free () from /lib/libc.so.6 #5 0x0046b580 in free () from /lib/libc.so.6 #6 0x081bb5b0 in Xfree (ptr=0x95dc0b0) at utils.c:1472 #7 0x00b9fa99 in fbDestroyPixmap (pPixmap=0x95dc0b0) at fbpixmap.c:107 #8 0x0072d6a5 in exaDestroyPixmap (pPixmap=0x95dc0b0) at exa.c:220 #9 0x0816f054 in damageDestroyPixmap (pPixmap=0x95dc0b0) at damage.c:1628 #10 0x00deb4e0 in XvDestroyPixmap (pPix=0x95dc0b0) at xvmain.c:393 #11 0x080825d5 in dixDestroyPixmap (value=0x95dc0b0, pid=4194665) at dispatch.c:1466 #12 0x08072bed in FreeResource (id=4194665, skipDeleteFuncType=0) at resource.c:536 #13 0x080841d3 in ProcFreePixmap (client=0x95b69d8) at dispatch.c:1540 #14 0x0814efd1 in XaceCatchDispatchProc (client=0x95b69d8) at xace.c:281 #15 0x0808936a in Dispatch () at dispatch.c:457 #16 0x080710a5 in main (argc=10, argv=0xbfad6ff4, envp=Cannot access memory at address 0xa01 ) at main.c:445
Once more, this time with glibc-debuginfo installed. We're getting hit by the glibc malloc detection stuff. Program received signal SIGABRT, Aborted. 0x00b2a402 in __kernel_vsyscall () (gdb) bt #0 0x00b2a402 in __kernel_vsyscall () #1 0x00428fa0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0x0042a8b1 in *__GI_abort () at abort.c:88 #3 0x0045febb in __libc_message (do_abort=2, fmt=0x526a44 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170 #4 0x00467f41 in _int_free (av=0x551120, mem=0x97dd9f8) at malloc.c:5887 #5 0x0046b580 in *__GI___libc_free (mem=0x97dd9f8) at malloc.c:3622 #6 0x081bb5b0 in Xfree (ptr=0x97dd9f8) at utils.c:1472 #7 0x00c50a99 in fbDestroyPixmap (pPixmap=0x97dd9f8) at fbpixmap.c:107 #8 0x002576a5 in exaDestroyPixmap (pPixmap=0x97dd9f8) at exa.c:220 #9 0x0816f054 in damageDestroyPixmap (pPixmap=0x97dd9f8) at damage.c:1628 #10 0x002ae4e0 in XvDestroyPixmap (pPix=0x97dd9f8) at xvmain.c:393 #11 0x080825d5 in dixDestroyPixmap (value=0x97dd9f8, pid=4194747) at dispatch.c:1466 #12 0x08072bed in FreeResource (id=4194747, skipDeleteFuncType=0) at resource.c:536 #13 0x080841d3 in ProcFreePixmap (client=0x97ae9d8) at dispatch.c:1540 #14 0x0814efd1 in XaceCatchDispatchProc (client=0x97ae9d8) at xace.c:281 #15 0x0808936a in Dispatch () at dispatch.c:457 #16 0x080710a5 in main (argc=10, argv=0xbffbba54, envp=Cannot access memory at address 0xa50 ) at main.c:445
The original bug should be fixed in latest git. However, the SIGABRT problem is a different bug (not sure if it's our fault or not).
(In reply to comment #6) > The original bug should be fixed in latest git. > > However, the SIGABRT problem is a different bug (not sure if it's our fault or > not). With the latest commits from Ben to DDX, I confirm both bugs fixed. I mean, with little testing where they usually triggered, they do not trigger anymore. Both segfault and abort.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.