Bug 62914

Summary: ZaphodHeads doesn't work after upgrading to xorg 1.14
Product: xorg Reporter: Damian Nowak <nowaker>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: jasonbstubbs, max, nowaker
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
My Xorg.log
none
My Xorg.0.log
none
disassemble of segfault location
none
Disassembly of drmmode_set_mode_major none

Description Damian Nowak 2013-03-29 19:01:13 UTC
CASE 1 (MAIN):

My 2 GPUs, 3 monitors setup stopped working after upgrading xorg to 1.14. I guess we should blame xorg as the same happened with 1.13 version. https://bugs.freedesktop.org/show_bug.cgi?id=56347
I tried using upstream nouveau as well.

[2013-03-29 16:43] upgraded xf86-video-nouveau (1.0.6-1 -> 1.0.7-1)
[2013-03-29 16:43] upgraded xorg-server (1.13.2.901-1 -> 1.14.0-2)

Backtrace from upstream nouveau, commit 6771424d79e541d2fa7253a582db3dc9108fd97d.

[  2312.187] (EE) 0: /usr/bin/X (xorg_backtrace+0x36) [0x58a846]
[  2312.187] (EE) 1: /usr/bin/X (0x400000+0x18e669) [0x58e669]
[  2312.187] (EE) 2: /usr/lib/libpthread.so.0 (0x7f1738fa2000+0xf1e0) [0x7f1738fb11e0]
[  2312.187] (EE) 3: /usr/local/lib/xorg/modules/drivers/nouveau_drv.so (0x7f1733dc0000+0x25e30) [0x7f1733de5e30]
[  2312.187] (EE) 4: /usr/bin/X (xf86CrtcSetModeTransform+0x14d) [0x4ae31d]
[  2312.187] (EE) 5: /usr/bin/X (xf86SetDesiredModes+0x1b8) [0x4aeb58]
[  2312.187] (EE) 6: /usr/local/lib/xorg/modules/drivers/nouveau_drv.so (0x7f1733dc0000+0xe7d1) [0x7f1733dce7d1]
[  2312.188] (EE) 7: /usr/local/lib/xorg/modules/drivers/nouveau_drv.so (0x7f1733dc0000+0xf126) [0x7f1733dcf126]
[  2312.188] (EE) 8: /usr/bin/X (0x400000+0xabbae) [0x4abbae]
[  2312.188] (EE) 9: /usr/bin/X (0x400000+0x2674d) [0x42674d]
[  2312.188] (EE) 10: /usr/lib/libc.so.6 (__libc_start_main+0xf5) [0x7f1737e2ea15]
[  2312.188] (EE) 11: /usr/bin/X (0x400000+0x26bdd) [0x426bdd]

Can you please investigate what could cause the error?

I took a look at other projects reporting xorg 1.14 fails, maybe this has something to do with my nouveau problem? Just a guess.
- https://bugs.freedesktop.org/show_bug.cgi?id=62112
- https://bugs.freedesktop.org/show_bug.cgi?id=62773

Files:
- full log:
  http://upload.nowaker.net/nwkr/1364581970_Xorg.0.log-HEAD
- my xorg.conf.3monitors - the same for two years:
  http://wklej.org/id/555276/txt/ 


CASE 2 (AUX):

When falling back to my 1 GPUs, 2 monitors setup with ZaphodHeads, I get only one monitor working. That may be interesting:

[  2345.932] (EE) NOUVEAU(1): [drm] failed to set drm interface version.
[  2345.932] (EE) NOUVEAU(1): [drm] error opening the drm
[  2345.932] (EE) NOUVEAU(1): 820: 

Files:
- full log:
  http://upload.nowaker.net/nwkr/1364582818_Xorg.0.log-2monitors
- my xorg.conf.2monitors:
  http://upload.nowaker.net/nwkr/1364582578_xorg.conf.2monitory


As usual, patches will be appreciated - I will test.
My environment: Arch Linux, kernel 3.8.4-1-ARCH.
Comment 1 Damian Nowak 2013-03-29 19:04:43 UTC
To be clear, the message "failed to set drm interface version" appears in CASE 1 as well. That's why these two cases are considered one bug.
Comment 2 Maxim P. Dementiev 2013-04-08 15:13:02 UTC
I've got the same problem.

X.Org X Server 1.13.1
Release Date: 2012-12-13
[  6286.574] X Protocol Version 11, Revision 0
[  6286.574] Build Operating System: Linux 3.6.11-gentoo x86_64 Gentoo
[  6286.575] Current Operating System: Linux vaio 3.7.10-gentoo #1 SMP Sun Mar 17 20:02:26 MSK 2013 x86_64
[  6286.575] Kernel command line: root=/dev/sda2
[  6286.576] Build Date: 16 February 2013  12:46:40AM
[  6286.576]  
[  6286.577] Current version of pixman: 0.28.0
.....
[  6286.581] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  6286.581] (==) ServerLayout "MyLayout"
[  6286.581] (**) |-->Screen "LCDScreen" (0)
[  6286.581] (**) |   |-->Monitor "MyVaioLCD"
[  6286.581] (**) |   |-->Device "NvidiaLCD"
[  6286.581] (**) |-->Screen "FlatronScreen" (1)
[  6286.581] (**) |   |-->Monitor "MyFlatron"
[  6286.581] (**) |   |-->Device "NvidiaCRT"
[  6286.581] (**) Option "Xinerama" "1"
.....
[  6286.582] (II) Module ABI versions:
[  6286.582] 	X.Org ANSI C Emulation: 0.4
[  6286.582] 	X.Org Video Driver: 13.1
[  6286.582] 	X.Org XInput driver : 18.0
[  6286.582] 	X.Org Server Extension : 7.0
[  6286.583] (II) config/udev: Adding drm device (/dev/dri/card0)
[  6286.585] (--) PCI:*(0:1:0:0) 10de:0a75:104d:9067 rev 162, Mem @ 0xe2000000/16777216, 0xd0000000/268435456, 0xe0000000/33554432, I/O @ 0x0000d000/128, BIOS @ 0x????????/524288
[  6286.585] (II) Open ACPI successful (/var/run/acpid.socket)
.....
[  6286.597] (II) Module glx: vendor="X.Org Foundation"
[  6286.597] 	compiled for 1.13.1, module version = 1.0.0
[  6286.597] 	ABI class: X.Org Server Extension, version 7.0
[  6286.597] (==) AIGLX enabled
[  6286.598] Loading extension GLX
[  6286.598] (II) LoadModule: "nouveau"
[  6286.598] (II) Loading /usr/lib64/xorg/modules/drivers/nouveau_drv.so
[  6286.598] (II) Module nouveau: vendor="X.Org Foundation"
[  6286.598] 	compiled for 1.13.1, module version = 1.0.6
[  6286.598] 	Module class: X.Org Video Driver
[  6286.598] 	ABI class: X.Org Video Driver, version 13.1
[  6286.598] (II) NOUVEAU driver 
.....
[  6286.598] (--) using VT number 7

[  6286.600] (II) [drm] nouveau interface version: 1.1.0
[  6286.600] (II) [drm] nouveau interface version: 1.1.0
[  6286.600] (II) Loading sub module "dri"
[  6286.600] (II) LoadModule: "dri"
[  6286.600] (II) Module "dri" already built-in
[  6286.600] (II) NOUVEAU(0): Loaded DRI module
[  6286.600] (--) NOUVEAU(0): Chipset: "NVIDIA NVa8"
[  6286.601] (**) NOUVEAU(0): Depth 24, (--) framebuffer bpp 32
[  6286.601] (==) NOUVEAU(0): RGB weight 888
[  6286.601] (==) NOUVEAU(0): Default visual is TrueColor
[  6286.601] (**) NOUVEAU(0): Option "ZaphodHeads" "LVDS-1,VGA-1"
[  6286.601] (==) NOUVEAU(0): Using HW cursor
[  6286.601] (==) NOUVEAU(0): GLX sync to VBlank disabled.
[  6286.601] (==) NOUVEAU(0): Page flipping enabled
[  6286.601] (==) NOUVEAU(0): Swap limit set to 2 [Max allowed 2]
[  6286.646] (II) NOUVEAU(0): Output LVDS-1 using monitor section MyVaioLCD
[  6286.646] (**) NOUVEAU(0): Option "Enable" "true"
[  6286.685] (II) NOUVEAU(0): Output VGA-1 using monitor section MyFlatron
[  6286.685] (**) NOUVEAU(0): Option "Enable" "true"
[  6286.689] (II) NOUVEAU(0): EDID for output LVDS-1
[  6286.689] (II) NOUVEAU(0): Manufacturer: SNY  Model: 6fa  Serial#: 0
.....
[  6286.729] (II) NOUVEAU(0): Output LVDS-1 enabled by config file
[  6286.729] (II) NOUVEAU(0): Output VGA-1 enabled by config file
[  6286.729] (II) NOUVEAU(0): Using exact sizes for initial modes
[  6286.729] (II) NOUVEAU(0): Output LVDS-1 using initial mode 1600x900
[  6286.729] (II) NOUVEAU(0): Output VGA-1 using initial mode 1600x900
.....
[  6286.730] (II) Module shadowfb: vendor="X.Org Foundation"
[  6286.730] 	compiled for 1.13.1, module version = 1.0.0
[  6286.730] 	ABI class: X.Org ANSI C Emulation, version 0.4
[  6286.730] (II) Loading sub module "dri"
[  6286.730] (II) LoadModule: "dri"
[  6286.730] (II) Module "dri" already built-in
[  6286.730] (II) NOUVEAU(1): Loaded DRI module
[  6286.730] (EE) NOUVEAU(1): [drm] failed to set drm interface version.
[  6286.730] (EE) NOUVEAU(1): [drm] error opening the drm
[  6286.730] (EE) NOUVEAU(1): 819: 
[  6286.730] (II) UnloadModule: "nouveau"
[  6286.730] (--) Depth 24 pixmap format is 32 bpp
[  6286.733] (II) NOUVEAU(0): Opened GPU channel 0
[  6286.747] (II) NOUVEAU(0): [DRI2] Setup complete
[  6286.747] (II) NOUVEAU(0): [DRI2]   DRI driver: nouveau
[  6286.747] (II) NOUVEAU(0): [DRI2]   VDPAU driver: nouveau
[  6286.750] (II) EXA(0): Driver allocated offscreen pixmaps
Comment 3 Damian Nowak 2013-04-08 15:25:23 UTC
OK, so it looks like this time it's DRM to blame, not xorg 1.14.
Comment 4 Maxim P. Dementiev 2013-04-11 05:47:15 UTC
I've upgraded x11-drivers/xf86-video-nouveau from 1.0.6 to 1.0.7, the result is still the same.

How can I find (or refine) the reason?

About the kernel:
/usr/src/linux-3.7.10-gentoo/drivers/gpu/drm/nouveau/nouveau_drm.h:#define DRIVER_DATE		"20120801"
Comment 5 Maxim P. Dementiev 2013-04-15 07:19:34 UTC
Unfortunately, I cannot use stand-alone drm module with my 3.7.10 kernel:

/var/tmp/portage/x11-base/nouveau-drm-20121015/work/master/drivers/gpu/drm/drm_gem.c: In function ‘drm_gem_mmap’:
/var/tmp/portage/x11-base/nouveau-drm-20121015/work/master/drivers/gpu/drm/drm_gem.c:709:19: error: ‘VM_RESERVED’ undeclared (first use in this function)
/var/tmp/portage/x11-base/nouveau-drm-20121015/work/master/drivers/gpu/drm/drm_gem.c:709:19: note: each undeclared identifier is reported only once for each function it appears in

New 3.8.7 kernel contains the same version of drm:

/usr/src/linux-3.7.10-gentoo/drivers/gpu/drm/nouveau/nouveau_drm.h:#define DRIVER_DATE		"20120801"
/usr/src/linux-3.8.7-gentoo/drivers/gpu/drm/nouveau/nouveau_drm.h:#define DRIVER_DATE		"20120801"
Comment 6 Emil Velikov 2013-04-25 01:43:04 UTC
A _very_ wild guess - this is an existing drm race bug, exposed with recent X work

Keep xf86-video-nouveau (ddx) at 1.0.7, and bisect X
Note you may need to rebuild the ddx as well

Thanks
Comment 7 Damian Nowak 2013-05-05 00:21:49 UTC
Any instructions on how to "bisect X"? I have no idea what you meant. :)
Comment 8 Vincent Pelletier 2013-05-08 06:46:01 UTC
> Any instructions on how to "bisect X"?

http://git-scm.com/docs/git-bisect
http://git-scm.com/book/en/Git-Tools-Debugging-with-Git#Binary-Search

A bit farther (helped me for my first bisect, maybe it's just redundant with above project-agnostic docs):
http://wiki.winehq.org/RegressionTesting
Comment 9 Emil Velikov 2013-05-08 17:37:02 UTC
Can reproduce the issue but there is no "good" combination on my system, i.e. any combination of 

xf86-video-nouveau -> 1.0.6-1  1.0.7-1
xorg-server -> 1.13.2.901-1  1.14.0-2

produces "Failed to set drm interface version"


In terms of "how to bisect" the links provided are quite nice
Although I would suggest to confirm which package exactly caused the issue, before jumping into blind bisection

1. Revert xorg-server to previous version (1.13.2.901-1)
2. Keep xf86-video-nouveau 1.0.7-1, but rebuild it on top of the old/good xserver

and vice versa

Note: to avoid rebuilding the input drivers have SSH handy and observe Xorg.log for the offending lines

Note: if unsure about the specific build options, patches and others take a look at the distro packaging system
Comment 10 Emil Velikov 2013-05-13 10:09:57 UTC
Dave pushed a fix for the issue


commit d3b52efe959f255784f5ead16d7276ca0fb4cdb1
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon May 13 13:35:12 2013 +1000

    nouveau: attempt to fix zaphod since dri1 code removal
    
    j_v on #nouveau bisected b1a630b48210d6a3c44994fce1b73273000ace5c has
    breaking zaphod, on review it was trying to open the drm fd a second time
    which was unnecessary.
    
    Avoid the problem by storing the nv fd in an entity and have share it between
    the two scrn info recs.
Comment 11 Damian Nowak 2013-05-13 12:45:54 UTC
I will validate it today.
Comment 12 MasterPrenium 2013-05-14 10:14:09 UTC
Dear guys,

I had the same issue on an NV80 GPU card.

I've just tried your patch, it's a little bit better (I can see this kind
[  4957.508] (II) NOUVEAU(1): Output DVI-I-2 connected)

Which is great and better than before, but now I've a segfault ...
Comment 13 MasterPrenium 2013-05-14 10:14:59 UTC
Created attachment 79294 [details]
My Xorg.log

Here is my Xorg.log for the segfault ...
Comment 14 Emil Velikov 2013-05-14 12:07:10 UTC
At quick look at your log indicates another issue

  (EE) AIGLX error: Calling driver entry point failed

Please open another bug with more information [1]

Cheers
Emil

[1] http://nouveau.freedesktop.org/wiki/Bugs/
Comment 15 Damian Nowak 2013-05-19 12:09:37 UTC
The fix didn't work for me. I get the very same exception as in my first comment (titled CASE 1). Using bf72ae1f6574c540f0afc2d7845d41df43507a8f.

2013-05-19 log: http://upload.nowaker.net/nwkr/1368964535_Xorg.0.log
2013-03-29 log: http://upload.nowaker.net/nwkr/1364581970_Xorg.0.log-HEAD

Indeed MasterPremium's error is a different issue. I have never had such a message.
Comment 16 Emil Velikov 2013-05-19 12:23:42 UTC
Two separate issues in a single bug report :P

AFAICS the patch did resolve the second issue, although the first one seems quite specific to your system/setup

There are two fronts you can take to tackle this

1. Carry on with a bisection - first with xf86-video-nouveau and after that with any other package that was updated when the breakage occured

2. Bisect your xorg.conf, see which line(s) are causing the issue


I would recommend you try them in the order in which they are listed
Comment 17 Damian Nowak 2013-05-19 14:28:07 UTC
I am too weak to do this bisect thing. Maybe when I invite my hacker-friend for a beer I will manage to do that. ;-) But I took a look at `git log` and checked out 27a1a0616304e9b9f0ae842899b7d614f1026578 (which is actually your fix for my #56347), compiled and... it works! Will this info help you to figure out what actually happened?

Arch Linux 20130519
xorg-server 1.14.1-1
libdrm 2.4.45-1

A complete list of all X/drm/dri-related packages: http://wklej.org/id/1042829/
Comment 18 Ilia Mirkin 2013-05-19 20:02:17 UTC
An alternative to bisecting: you know where the crash is, it's at

[   360.877] (EE) 3: /usr/local/lib/xorg/modules/drivers/nouveau_drv.so (0x7f24ae751000+0x255c0) [0x7f24ae7765c0]
[   360.877] (EE) 4: /usr/bin/X (xf86CrtcSetModeTransform+0x12a) [0x4aad4a]

So load up nouveau_drv.so in gdb (gdb /usr/local/lib/xorg/modules/drivers/nouveau_drv.so)

And inside gdb, run

disassemble 0x255c0

That should tell you what function it's dying in, and the exact instructions it's dying on. This is what I did for bug 63263 (see my initial comment there), and that was able to pinpoint the crash exactly. (Of course figuring out the circumstances that lead to that condition can be trickier to work out.)

You can also compile the whole thing with -g, e.g. ./configure CFLAGS=-g or something like that. And make sure that the final thing isn't stripped. That might make gdb more cooperative if you're having trouble.
Comment 19 Damian Nowak 2013-05-19 20:08:14 UTC
@Ilia, thanks for suggestions. Leaving debug symbols sounds good. Will check it soon.
Comment 20 Jason Stubbs 2013-05-31 03:24:30 UTC
Hi,

I was getting the "[drm] error opening the drm" error, which is what led me to this bug, and d3b52efe solves it for me too. I think the crash is a different error as the "error opening the drm" error existed even on 1.0.4 whereas that version didn't have the crash.

Having said that, I'm getting the crash too and have got some of the information you've asked for so. seeing it's already been discussed on this bug, I'll go ahead and post it here.

git bisect led me to this commit:


commit 1fdd7db94b55c65ea62cc9eaefff620b20e9e4ea
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Jan 7 15:28:53 2013 +1000

    nouveau: add reverse prime support

    This allows the nvidia card to scanout Intel cards rendering.

    Signed-off-by: Dave Airlie <airlied@redhat.com>


It didn't revert cleanly against master and how to fix the conflict wasn't clear to me so I haven't been able to test that.

The disassemble points to the function drmmode_set_mode_major. I'll attach that output, along with my Xorg.0.log after this.
Comment 21 Jason Stubbs 2013-05-31 03:25:02 UTC
Created attachment 80077 [details]
My Xorg.0.log
Comment 22 Jason Stubbs 2013-05-31 03:28:01 UTC
Created attachment 80078 [details]
disassemble of segfault location

Backtrace:
disassemble 0x263f7 taken from frame 3 below

0: /usr/bin/X (xorg_backtrace+0x36) [0x53ccca]
1: /usr/bin/X (0x400000+0x13fe6b) [0x53fe6b]
2: /lib64/libpthread.so.0 (0x7f89dcbe8000+0x11070) [0x7f89dcbf9070]
3: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f89da6e0000+0x263f7) [0x7f89da7063f7]
4: /usr/bin/X (xf86CrtcSetModeTransform+0x14f) [0x493916]
5: /usr/bin/X (xf86SetDesiredModes+0x251) [0x493fb4]
6: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f89da6e0000+0xee51) [0x7f89da6eee51]
7: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7f89da6e0000+0xf78e) [0x7f89da6ef78e]
8: /usr/bin/X (0x400000+0x91c1e) [0x491c1e]
9: /usr/bin/X (0x400000+0x29362) [0x429362]
10: /lib64/libc.so.6 (__libc_start_main+0xed) [0x7f89db89464d]
11: /usr/bin/X (0x400000+0x29779) [0x429779]
Comment 23 Jason Stubbs 2013-05-31 03:37:16 UTC
(In reply to comment #20)
> commit 1fdd7db94b55c65ea62cc9eaefff620b20e9e4ea
> Author: Dave Airlie <airlied@redhat.com>
> Date:   Mon Jan 7 15:28:53 2013 +1000
> 
> It didn't revert cleanly against master and how to fix the conflict wasn't
> clear to me so I haven't been able to test that.

I spoke too soon. Reverting was as easy as deleting all the additions, whereas I was confused in thinking that the additions needed to be kept.

So, to cut a long story short, HEAD with the above patch reverted works for me.
Comment 24 Ilia Mirkin 2013-05-31 04:28:58 UTC
Could you post the disassembly of the function? Would like to see exactly where in that method it's dying. (I wonder if it's crtc->randr_crtc... something with a 0x2d0 struct offset.)

Also, the latest version of that code has added a #ifdef NOUVEAU_PIXMAP_SHARING around the added code in drmmode_set_mode_major... (which I assume you've tried running as well). What happens if you just remove that whole ifdef? (But leave the rest of the commit in.)
Comment 25 Jason Stubbs 2013-05-31 04:53:15 UTC
Created attachment 80079 [details]
Disassembly of drmmode_set_mode_major

This is taken running rev 1fdd7db94b55c65ea62cc9eaefff620b20e9e4ea

The segfault happens in:

349             if (crtc->randr_crtc->scanout_pixmap)
   0x00000000000263e2 <+498>:   mov    0x1b8(%rbp),%rax
   0x00000000000263e9 <+505>:   xor    %ecx,%ecx
   0x00000000000263eb <+507>:   xor    %r8d,%r8d
   0x00000000000263f7 <+519>:   cmpq   $0x0,0x2d0(%rax)
   0x00000000000263ff <+527>:   je     0x264f0 <drmmode_set_mode_major+768>

350                     x = y = 0;


HEAD with the patch reverted and #ifdef'd chunks removed succeeds, but patch reverted and #ifdef'd chunks kept (and a func declaration added) failed. Testing a pristine HEAD with the #ifdef'd chunks removed also succeeds.
Comment 26 Jason Stubbs 2013-05-31 04:56:01 UTC
Just so you don't think I've fallen off the face of the earth, I'm about to go home for the weekend and I can only reproduce the issue on my work PC (the only place I'm lucky enough to have three monitors!) so I won't be able to test again until Monday.
Comment 27 Ilia Mirkin 2013-05-31 05:30:28 UTC
So the faulting address is

0x00000000000263f7 <+519>:   cmpq   $0x0,0x2d0(%rax)

Which means that crtc->randr_crtc is NULL (and ->scanout_pixmap is 0x2d0 bytes into the structure).

Hopefully this should provide enough info to someone more knowledgeable than myself to figure out what's going on.
Comment 28 Emil Velikov 2013-07-30 10:27:03 UTC
Guys plese try xf86-video-nouveau 1.0.9
It contains the following patch which should handle the crtc->randr_crtc == NULL case as spotted by Ilia

commit be44e7804862b4c276ed4d4717b1212920f428e6
Author: Dave Airlie <airlied@gmail.com>
Date:   Tue Jul 30 15:26:46 2013 +1000

    nouveau: fix crash when xinerama is enabled.
    
    Signed-off-by: Dave Airlie <airlied@redhat.com>

Closing for now

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.