I'm not 100% sure what triggers the bug, but I think every time there has been some sort of window movement involved, potentially across desktops (I use WindowMaker, with drag window to move to other desktop option enabled). Often it is an mplayer window (but perhaps not 100% of the time, not sure). I've seen this issue multiple times over the past few months. (Always when there's some sort of video activity though.) [521319.093] (EE) Backtrace: [521319.097] (EE) 0: /usr/bin/X (xorg_backtrace+0x34) [0x5970c4] [521319.097] (EE) 1: /usr/bin/X (0x400000+0x19aef9) [0x59aef9] [521319.097] (EE) 2: /lib64/libpthread.so.0 (0x7f5ff8670000+0x10410) [0x7f5ff8680410] [521319.097] (EE) 3: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f5ff6149000+0xa644) [0x7f5ff6153644] [521319.098] (EE) 4: /usr/bin/X (0x400000+0x95a9e) [0x495a9e] [521319.098] (EE) 5: /usr/bin/X (0x400000+0xe1ed2) [0x4e1ed2] [521319.098] (EE) 6: /usr/bin/X (0x400000+0x3b1b1) [0x43b1b1] [521319.098] (EE) 7: /usr/bin/X (0x400000+0x29c7a) [0x429c7a] [521319.098] (EE) 8: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7f5ff72f64bd] [521319.098] (EE) 9: /usr/bin/X (0x400000+0x29fd1) [0x429fd1] [521319.098] (EE) [521319.098] (EE) Segmentation fault at address 0x7f5ff36a1001 Decoding nouveau_drv.so 0xa644 gets me: NVPutImage: ... 0x000000000000a627 <+4887>: mov %rdx,%r11 0x000000000000a62a <+4890>: mov %eax,%r10d 0x000000000000a62d <+4893>: nopl (%rax) 0x000000000000a630 <+4896>: lea (%rsi,%r13,1),%rdi 0x000000000000a634 <+4900>: xor %edx,%edx 0x000000000000a636 <+4902>: test %r9d,%r9d 0x000000000000a639 <+4905>: jle 0xaa2a <NVPutImage+5914> 0x000000000000a63f <+4911>: nop 0x000000000000a640 <+4912>: movzbl (%rdi,%rdx,2),%eax --> 0x000000000000a644 <+4916>: movzbl 0x1(%rsi,%rdx,2),%ecx 0x000000000000a649 <+4921>: shl $0x8,%eax 0x000000000000a64c <+4924>: shl $0x10,%ecx 0x000000000000a64f <+4927>: or %ecx,%eax 0x000000000000a651 <+4929>: movzbl (%rsi,%rdx,2),%ecx 0x000000000000a655 <+4933>: or %ecx,%eax 0x000000000000a657 <+4935>: movzbl 0x1(%rdi,%rdx,2),%ecx 0x000000000000a65c <+4940>: shl $0x18,%ecx 0x000000000000a65f <+4943>: or %ecx,%eax 0x000000000000a661 <+4945>: mov %eax,(%r8,%rdx,4) 0x000000000000a665 <+4949>: add $0x1,%rdx 0x000000000000a669 <+4953>: cmp %edx,%r9d 0x000000000000a66c <+4956>: jg 0xa640 <NVPutImage+4912> 0x000000000000a66e <+4958>: lea 0x0(%r13,%rbp,1),%rdi And doing a compile of nouveau_xv.c with -g -ggdb lets me track this down to NVCopyNV12ColorPlanes, specifically: *vuvud++ = vs[0] | (us[0]<<8) | (vs[1]<<16) | (us[1]<<24); Even though it's inlined, gcc is quite clever and it's unclear which of the two invocations from NVPutImage it really is (it just jumps from both invocations to the same-ish place). What's really odd is that it's the load of vs[1] that's breaking. Looking closely, it seems like vs[1] is accessed before vs[0] though, so the entire vs pointer may be wrong. It looks like both vs and us are offsets into the buf passed into NVPutImage... but I don't see a length, perhaps there's a way to tell if we've gone too far? Here are some versions of some things I'm running: x11-drivers/xf86-video-nouveau-1.0.4 x11-base/xorg-server-1.13.1 media-libs/mesa-9.0.1 I know that there are later xf86-video-nouveau drivers, but there don't appear to have been any changes to nouveau_xv.c.
Reading the code some more, higher up in the call chain (ProcXv{Shm,}PutImage), a call to NVQueryImageAttributes is made to determine how many bytes the source image needs to be, and if there aren't enough bytes, we never even make it to NVPutImage. So this means that we're looking for some sort of inconsistency between NVQueryImageAttributes and how far into the source buffer NVPutImage reads. Sadly the logic in NVPutImage is separate, and is much more complex. By the way, it occurs to me that I didn't mention what hardware I'm using, in case it matters: 02:00.0 VGA compatible controller: NVIDIA Corporation G96 [GeForce 9500 GT] (rev a1) And I'm using x11-libs/libdrm-2.4.40 (which nouveau_xv.c depends on via nouveau_bo_*)
A little more info: I added code to call NVQueryImageAttributes inside of NVPutImage, compute an end pointer (buf + size), and then check inside of NVCopyNV12ColorPlanes at the end of every loop iteration whether either us or vs have gone off the end. And it seems like they do! When I move the mplayer window s.t. part of it is off-screen (on the left), the code ends up accessing 2 bytes further than the end of the array! There happens to be another mapping afterwards which means that there's no segfault, but if that mapping isn't there, a segfault would have occurred. Now, it only ever goes over by 1-3 bytes, never more. One thing that I noticed is that we pass in line_len to NVCopyNV12ColorPlanes as the width (which is rounded up to 8 on NV_50 and up) rather than npixels (which is rounded up to 4). I also wonder if there's some issue in how left is computed (and then applied to the s2/3 offsets)...
Looks like my npixels vs line_len guess was right: [741217.047] (WW) NOUVEAU(0): Inconsistency between NVPutImage and NVQueryImageAttributes: src_x: 0, src_y: 0, src_w: 512, src_h: 384, top: 0, left: 102, bottom: 384, right: 512, npixels: 410, nlines: 384, line_len: 416, srcPitch2: 256, s2offset: 196659, s3offset: 245811, bytes: 49104, size_tmp: 294912, over: 3 [741217.047] (WW) NOUVEAU(0): Went too far: us: 0x7f7e20b6c002, vs: 0x7f7e20b78002, end (non-incl): 0x7f7e20b78000 And the reason I only see it when the picture is off-screen is because normally the width is already a multiple of 8. Not sure what the right way of fixing it is though... I think we can just use npixels instead of line_len as the width parameter. However that will leave the last 1-3 pixels in the dst buffer uninitialized. Not sure if that's OK.
You can leave the last few pixels uninitialized. The last few pixels already contained garbage, and it's up to the implementation to make sure they're not included when rendering the picture. :)
Created attachment 78944 [details] [review] Use npixels instead of line_len for the width to avoid going too far
Thank you for debugging and fixing this Ilia ^_^ Your patch has been pushed to master
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.