Bug 63263

Summary: X server crash in nouveau_xv.c:NVPutImage (NVCopyNV12ColorPlanes)
Product: xorg Reporter: Ilia Mirkin <imirkin>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Use npixels instead of line_len for the width to avoid going too far none

Description Ilia Mirkin 2013-04-08 12:01:30 UTC
I'm not 100% sure what triggers the bug, but I think every time there has been some sort of window movement involved, potentially across desktops (I use WindowMaker, with drag window to move to other desktop option enabled). Often it is an mplayer window (but perhaps not 100% of the time, not sure). I've seen this issue multiple times over the past few months. (Always when there's some sort of video activity though.)

[521319.093] (EE) Backtrace:
[521319.097] (EE) 0: /usr/bin/X (xorg_backtrace+0x34) [0x5970c4]
[521319.097] (EE) 1: /usr/bin/X (0x400000+0x19aef9) [0x59aef9]
[521319.097] (EE) 2: /lib64/libpthread.so.0 (0x7f5ff8670000+0x10410) [0x7f5ff8680410]
[521319.097] (EE) 3: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f5ff6149000+0xa644) [0x7f5ff6153644]
[521319.098] (EE) 4: /usr/bin/X (0x400000+0x95a9e) [0x495a9e]
[521319.098] (EE) 5: /usr/bin/X (0x400000+0xe1ed2) [0x4e1ed2]
[521319.098] (EE) 6: /usr/bin/X (0x400000+0x3b1b1) [0x43b1b1]
[521319.098] (EE) 7: /usr/bin/X (0x400000+0x29c7a) [0x429c7a]
[521319.098] (EE) 8: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7f5ff72f64bd]
[521319.098] (EE) 9: /usr/bin/X (0x400000+0x29fd1) [0x429fd1]
[521319.098] (EE) 
[521319.098] (EE) Segmentation fault at address 0x7f5ff36a1001

Decoding nouveau_drv.so 0xa644 gets me:
NVPutImage:
...
   0x000000000000a627 <+4887>:  mov    %rdx,%r11
   0x000000000000a62a <+4890>:  mov    %eax,%r10d
   0x000000000000a62d <+4893>:  nopl   (%rax)
   0x000000000000a630 <+4896>:  lea    (%rsi,%r13,1),%rdi
   0x000000000000a634 <+4900>:  xor    %edx,%edx
   0x000000000000a636 <+4902>:  test   %r9d,%r9d
   0x000000000000a639 <+4905>:  jle    0xaa2a <NVPutImage+5914>
   0x000000000000a63f <+4911>:  nop
   0x000000000000a640 <+4912>:  movzbl (%rdi,%rdx,2),%eax
-->   0x000000000000a644 <+4916>:  movzbl 0x1(%rsi,%rdx,2),%ecx
   0x000000000000a649 <+4921>:  shl    $0x8,%eax
   0x000000000000a64c <+4924>:  shl    $0x10,%ecx
   0x000000000000a64f <+4927>:  or     %ecx,%eax
   0x000000000000a651 <+4929>:  movzbl (%rsi,%rdx,2),%ecx
   0x000000000000a655 <+4933>:  or     %ecx,%eax
   0x000000000000a657 <+4935>:  movzbl 0x1(%rdi,%rdx,2),%ecx
   0x000000000000a65c <+4940>:  shl    $0x18,%ecx
   0x000000000000a65f <+4943>:  or     %ecx,%eax
   0x000000000000a661 <+4945>:  mov    %eax,(%r8,%rdx,4)
   0x000000000000a665 <+4949>:  add    $0x1,%rdx
   0x000000000000a669 <+4953>:  cmp    %edx,%r9d
   0x000000000000a66c <+4956>:  jg     0xa640 <NVPutImage+4912>
   0x000000000000a66e <+4958>:  lea    0x0(%r13,%rbp,1),%rdi

And doing a compile of nouveau_xv.c with -g -ggdb lets me track this down to NVCopyNV12ColorPlanes, specifically:

*vuvud++ = vs[0] | (us[0]<<8) | (vs[1]<<16) | (us[1]<<24);

Even though it's inlined, gcc is quite clever and it's unclear which of the two invocations from NVPutImage it really is (it just jumps from both invocations to the same-ish place).

What's really odd is that it's the load of vs[1] that's breaking. Looking closely, it seems like vs[1] is accessed before vs[0] though, so the entire vs pointer may be wrong.

It looks like both vs and us are offsets into the buf passed into NVPutImage... but I don't see a length, perhaps there's a way to tell if we've gone too far?

Here are some versions of some things I'm running:

x11-drivers/xf86-video-nouveau-1.0.4
x11-base/xorg-server-1.13.1
media-libs/mesa-9.0.1

I know that there are later xf86-video-nouveau drivers, but there don't appear to have been any changes to nouveau_xv.c.
Comment 1 Ilia Mirkin 2013-04-10 16:57:17 UTC
Reading the code some more, higher up in the call chain (ProcXv{Shm,}PutImage), a call to NVQueryImageAttributes is made to determine how many bytes the source image needs to be, and if there aren't enough bytes, we never even make it to NVPutImage.

So this means that we're looking for some sort of inconsistency between NVQueryImageAttributes and how far into the source buffer NVPutImage reads. Sadly the logic in NVPutImage is separate, and is much more complex.

By the way, it occurs to me that I didn't mention what hardware I'm using, in case it matters:

02:00.0 VGA compatible controller: NVIDIA Corporation G96 [GeForce 9500 GT] (rev a1)

And I'm using x11-libs/libdrm-2.4.40 (which nouveau_xv.c depends on via nouveau_bo_*)
Comment 2 Ilia Mirkin 2013-04-11 00:00:26 UTC
A little more info:

I added code to call NVQueryImageAttributes inside of NVPutImage, compute an end pointer (buf + size), and then check inside of NVCopyNV12ColorPlanes at the end of every loop iteration whether either us or vs have gone off the end.

And it seems like they do! When I move the mplayer window s.t. part of it is off-screen (on the left), the code ends up accessing 2 bytes further than the end of the array! There happens to be another mapping afterwards which means that there's no segfault, but if that mapping isn't there, a segfault would have occurred.

Now, it only ever goes over by 1-3 bytes, never more. One thing that I noticed is that we pass in line_len to NVCopyNV12ColorPlanes as the width (which is rounded up to 8 on NV_50 and up) rather than npixels (which is rounded up to 4). I also wonder if there's some issue in how left is computed (and then applied to the s2/3 offsets)...
Comment 3 Ilia Mirkin 2013-04-11 02:06:59 UTC
Looks like my npixels vs line_len guess was right:

[741217.047] (WW) NOUVEAU(0): Inconsistency between NVPutImage and NVQueryImageAttributes: src_x: 0, src_y: 0, src_w: 512, src_h: 384, top: 0, left: 102, bottom: 384, right: 512, npixels: 410, nlines: 384, line_len: 416, srcPitch2: 256, s2offset: 196659, s3offset: 245811, bytes: 49104, size_tmp: 294912, over: 3
[741217.047] (WW) NOUVEAU(0): Went too far: us: 0x7f7e20b6c002, vs: 0x7f7e20b78002, end (non-incl): 0x7f7e20b78000

And the reason I only see it when the picture is off-screen is because normally the width is already a multiple of 8. Not sure what the right way of fixing it is though... I think we can just use npixels instead of line_len as the width parameter. However that will leave the last 1-3 pixels in the dst buffer uninitialized. Not sure if that's OK.
Comment 4 Maarten Lankhorst 2013-04-29 13:48:50 UTC
You can leave the last few pixels uninitialized. The last few pixels already contained garbage, and it's up to the implementation to make sure they're not included when rendering the picture. :)
Comment 5 Ilia Mirkin 2013-05-06 20:51:11 UTC
Created attachment 78944 [details] [review]
Use npixels instead of line_len for the width to avoid going too far
Comment 6 Emil Velikov 2013-05-08 14:53:16 UTC
Thank you for debugging and fixing this Ilia ^_^

Your patch has been pushed to master

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.