79823 – [NV30/gallium] Mozilla apps freeze on startup with nouveau-dri-10.2.1 libs on dual-screen

Bug 79823 - [NV30/gallium] Mozilla apps freeze on startup with nouveau-dri-10.2.1 libs on dual-screen

Summary: [NV30/gallium] Mozilla apps freeze on startup with nouveau-dri-10.2.1 libs on...

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/nouveau (show other bugs)
Version:	unspecified
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	Nouveau Project
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-06-09 09:03 UTC by Natrio
Modified:	2014-06-25 05:34 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
Htop sreenshot (10.73 KB, image/png) 2014-06-09 09:03 UTC, Natrio	Details
Valgrind glxtest logs (15.53 KB, text/plain) 2014-06-18 07:51 UTC, Natrio	Details
GDB log witn reverted patch (2.40 KB, text/plain) 2014-06-19 07:39 UTC, Natrio	Details
GDB log witn clear mesa-10.2.1 (1.54 KB, text/plain) 2014-06-19 07:55 UTC, Natrio	Details
GDB log with "avoid dangling..." patch (1.54 KB, text/plain) 2014-06-19 08:08 UTC, Natrio	Details
GDB log with both patches (1.59 KB, text/plain) 2014-06-19 08:18 UTC, Natrio	Details
GDB log with "dup fd..." patch (3.25 KB, text/plain) 2014-06-19 09:38 UTC, Natrio	Details
View All

Description Natrio 2014-06-09 09:03:14 UTC

Created attachment 100709 [details]
Htop sreenshot

All tested mozilla apps (firefox, thunderbird and seamonkey) freeze on startup with nouveau-dri-10.2.1 libraries. If nouveau-dri libraries downgraded to 10.1.4, freezes gone, even without restarting x-server.

Freeze localized in forked mozilla process (with ~100% cpu eating), killing this process can be restore mozilla's startup.

This bug detected only in nouveau dual-screen configuration. Intel dual-screen works nornally, nouveau single-screen works too.

Arch linux, i686
Linux kernels: 3.14.6-1 and 3.10.42-1 (LTS)
Hardware: GeForce FX5200

/etc/X11/xorg.conf.d/21-nouveau.conf :
Section "ServerLayout"
    Identifier     "dual"
    Screen      0  "Screen0"
    Screen      1  "Screen1" RightOf "Screen0"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Plug-n-play"
    DisplaySize     323    244
    HorizSync       30.0 - 69.0
    VertRefresh     50.0 - 160.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor1"
    VendorName     "Unknown"
    ModelName      "TV-0"
    DisplaySize     200    150
EndSection

Section "Device"
    Identifier     "Videocard0"
    Driver         "nouveau"
    BusID          "PCI:1:0:0"
    Screen          0
    Option          "ZaphodHeads" "VGA-1"
EndSection

Section "Device"
    Identifier     "Videocard1"
    Driver         "nouveau"
    BusID          "PCI:1:0:0"
    Screen          1
    Option          "ZaphodHeads" "TV-1"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Videocard0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
	Modes	   "1024x768_85"
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Videocard1"
    Monitor        "Monitor1"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
	Modes	   "720x576"
    EndSubSection
EndSection

Comment 1 Ilia Mirkin 2014-06-09 13:25:48 UTC

There haven't been too many changes between 10.1 and 10.2 to the nv30 driver. Mind doing a bisect? It seems reasonable to restrict it to src/gallium/drivers/nouveau/nv30.

By the way, why are you using zaphod heads + 2 separate screens (and no xinerama I assume) instead of randr and having it all in one screen and dynamically configurable.

Comment 2 Natrio 2014-06-09 15:58:14 UTC

(In reply to comment #1)
> There haven't been too many changes between 10.1 and 10.2 to the nv30
> driver. Mind doing a bisect? It seems reasonable to restrict it to
> src/gallium/drivers/nouveau/nv30.
I've never done a bisect before, but I have an strace log from mozilla start – with and without freeze. The freeze log is so big (9MB), but if it helps, I will post it tomorrow.

> By the way, why are you using zaphod heads + 2 separate screens (and no
> xinerama I assume) instead of randr and having it all in one screen and
> dynamically configurable.
I don't need a joint dynamic screen for two monitors. The second strictly dedicated screen used only for video playback on TV, and the first regular screen on monitor used for everything else.

Comment 3 Ilia Mirkin 2014-06-09 16:01:17 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > There haven't been too many changes between 10.1 and 10.2 to the nv30
> > driver. Mind doing a bisect? It seems reasonable to restrict it to
> > src/gallium/drivers/nouveau/nv30.
> I've never done a bisect before, but I have an strace log from mozilla start
> – with and without freeze. The freeze log is so big (9MB), but if it helps,
> I will post it tomorrow.

strace log is not that useful. It's probably sitting around waiting for something to happen, which never happens. Bisecting is fairly easy... there are a bunch of guides online. I'd recommend installing into some scratch location and using LD_LIBRARY_PATH to pick up your new lib, so that you don't mess up your system install.

Comment 4 Ilia Mirkin 2014-06-09 16:04:53 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > There haven't been too many changes between 10.1 and 10.2 to the nv30
> > > driver. Mind doing a bisect? It seems reasonable to restrict it to
> > > src/gallium/drivers/nouveau/nv30.
> > I've never done a bisect before, but I have an strace log from mozilla start
> > – with and without freeze. The freeze log is so big (9MB), but if it helps,
> > I will post it tomorrow.
> 
> strace log is not that useful. It's probably sitting around waiting for
> something to happen, which never happens. Bisecting is fairly easy... there
> are a bunch of guides online. I'd recommend installing into some scratch
> location and using LD_LIBRARY_PATH to pick up your new lib, so that you
> don't mess up your system install.

Oh, and you can find some build instructions at http://nouveau.freedesktop.org/wiki/InstallNouveau/#mesa

Comment 5 Natrio 2014-06-09 16:14:23 UTC

(In reply to comment #3)
> strace log is not that useful. It's probably sitting around waiting for
> something to happen, which never happens. Bisecting is fairly easy... there
> are a bunch of guides online. I'd recommend installing into some scratch
> location and using LD_LIBRARY_PATH to pick up your new lib, so that you
> don't mess up your system install.
It's not exactly "sitting around and waiting", it cycling around (up to 100% CPU used) and keeps repeating the same action, probably with nouveau-dri library.

(In reply to comment #4)
> Oh, and you can find some build instructions at
> http://nouveau.freedesktop.org/wiki/InstallNouveau/#mesa
Thank you, I will try to do it.

Comment 6 Natrio 2014-06-10 07:54:17 UTC

From freeze strace log:

 # Device opened:
open("/dev/dri/card0", O_RDWR|O_CLOEXEC) = 5
 ...
close(5)                    = 0
 # device /dev/dri/card0 closed, but trying to use already closed descriptor:
fstat64(5, 0xbfa97f60)      = -1 EBADF (Bad file descriptor)
ioctl(5, 0xc0406481, 0xbfa97fc0) = -1 EBADF (Bad file descriptor)
write(2, "nouveau: kernel rejected pushbuf"..., 54) = 54
write(2, "nouveau: ch0: krec 0 pushes 1 bu"..., 46) = 46
write(2, "nouveau: ch0: buf 00000000 00000"..., 63) = 63
write(2, "nouveau: ch0: psh 00000000 00000"..., 49) = 49
write(2, "nouveau: \t0x0008fd6c\n", 21) = 21
write(2, "nouveau: \t0x00000000\n", 21) = 21
write(2, "nouveau: \t0x00000003\n", 21) = 21
write(2, "nouveau: \t0x00020000\n", 21) = 21
write(2, "nouveau: \t0x00000000\n", 21) = 21

 # Cycling infinitely
sched_yield()               = 0
sched_yield()               = 0
sched_yield()               = 0
 ...

Comment 7 Ilia Mirkin 2014-06-10 20:56:13 UTC

(In reply to comment #6)
> From freeze strace log:
> 
>  # Device opened:
> open("/dev/dri/card0", O_RDWR|O_CLOEXEC) = 5
>  ...
> close(5)                    = 0
>  # device /dev/dri/card0 closed, but trying to use already closed descriptor:
> fstat64(5, 0xbfa97f60)      = -1 EBADF (Bad file descriptor)
> ioctl(5, 0xc0406481, 0xbfa97fc0) = -1 EBADF (Bad file descriptor)
> write(2, "nouveau: kernel rejected pushbuf"..., 54) = 54

Curious. Do you have a dmesg from that? And the full output that was written out to stderr?

A bisect will still be the most effective way of tracking down the issue.

Comment 8 Natrio 2014-06-11 05:06:14 UTC

(In reply to comment #7)
> Curious. Do you have a dmesg from that? And the full output that was written
> out to stderr?
No messages in any system log at this time.
No stderr output – a library sends output to descriptor #2, but mozilla was opened it as /dev/null.

> A bisect will still be the most effective way of tracking down the issue.
Bisecting in progress, 2 steps from 10.

Comment 9 Natrio 2014-06-11 10:05:20 UTC

Bisecting not finished yet, but I think about that:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=507f0230d4ca2238c818006499e21abb4c133203

Comment 10 Natrio 2014-06-11 12:12:48 UTC

Bisect complete, I was right.

507f0230d4ca2238c818006499e21abb4c133203 is the first bad commit
commit 507f0230d4ca2238c818006499e21abb4c133203
Author: Ilia Mirkin <imirkin@alum.mit.edu>
Date:   Wed Mar 5 22:25:55 2014 -0500

    nouveau: fix fence waiting logic in screen destroy
    
    nouveau_fence_wait has the expectation that an external entity is
    holding onto the fence being waited on, not that it is merely held onto
    by the current pointer. Fixes a use-after-free in nouveau_fence_wait
    when used on the screen's current fence.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75279
    Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
    Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
    Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>

:040000 040000 46ad31dd95db1695a8c91fffbf38ee7822d8a31f cdf816d872018460b33f69949010f09dcdcbf38e M src

Comment 11 Ilia Mirkin 2014-06-11 16:55:39 UTC

Hm, that's funny. You said 10.1.4 worked, which contains that commit... I suspected that commit as well until I saw that you said 10.1.4 was fine. I'll think about it.

Comment 12 Natrio 2014-06-11 17:17:46 UTC

(In reply to comment #11)
> Hm, that's funny. You said 10.1.4 worked, which contains that commit... I
> suspected that commit as well until I saw that you said 10.1.4 was fine.
> I'll think about it.

I said 10.1.4 WAS fine, already builded (21.06.2014) and downloaded from Arch Linux repository.
But in current building environment, all revisions since bisected commit seems buggy on nv30.

I will try to revert this patch on tomorrow.

Comment 13 Ilia Mirkin 2014-06-11 17:21:09 UTC

(In reply to comment #12)
> (In reply to comment #11)
> > Hm, that's funny. You said 10.1.4 worked, which contains that commit... I
> > suspected that commit as well until I saw that you said 10.1.4 was fine.
> > I'll think about it.
> 
> I said 10.1.4 WAS fine, already builded (21.06.2014) and downloaded from
> Arch Linux repository.

Yes, that was my implication as well.

> But in current building environment, all revisions since bisected commit
> seems buggy on nv30.
> 
> I will try to revert this patch on tomorrow.

There's probably some additional interaction with a patch that hadn't made it to 10.1... perhaps the screen sharing change to better support NV_vdpau_interop, but I don't really see how that could cause issues either.

Comment 14 Natrio 2014-06-14 08:32:46 UTC

The version 10.2.1 tested: no freeze with reverted patch for nv30_screen.c

Comment 15 Natrio 2014-06-16 08:07:24 UTC

Bug not only with mozilla, but glxinfo:

-- Version 10.2.1:
 freeze afrer end of (normal) output, and print this to stderr:
nouveau: kernel rejected pushbuf: Bad file descriptor
nouveau: ch0: krec 0 pushes 1 bufs 1 relocs 0
nouveau: ch0: buf 00000000 00000002 00000004 00000004 00000000
nouveau: ch0: psh 00000000 0000000140 0000000154
nouveau:        0x0008fd6c
nouveau:        0x00000000
nouveau:        0x00000004
nouveau:        0x00020000
nouveau:        0x00000000

-- Version 10.2.1 (reverted patch):
 seems to bee ok.

-- Version 10.1.4 (from repository):
 segfault after (normal) output, in dmesg:
glxinfo[9715]: segfault at 88 ip b6d88829 sp bffa4420 error 4 in nouveau_dri.so[b6a37000+64a000]

-- Last revision before bisected commit:
 seems to bee ok.

-- Bisected commit:
 freeze, stderr:
nouveau: kernel rejected pushbuf: Bad file descriptor
nouveau: ch0: krec 0 pushes 1 bufs 1 relocs 0
nouveau: ch0: buf 00000000 00000002 00000004 00000004 00000000
nouveau: ch0: psh 00000000 0000000140 0000000154
nouveau:        0x0008fd6c
nouveau:        0x00000000
nouveau:        0x00000004
nouveau:        0x00020000
nouveau:        0x00000000

Comment 16 Natrio 2014-06-16 08:27:50 UTC

glxtest (from bug #94465):

-- Version 10.1.4 (from repository):
VENDOR
nouveau
RENDERER
Gallium 0.4 on NV34
VERSION
1.5 Mesa 10.1.4
TFP
TRUE

-- Version 10.2.1:
nouveau: kernel rejected pushbuf: Bad file descriptor
nouveau: ch0: krec 0 pushes 1 bufs 1 relocs 0
nouveau: ch0: buf 00000000 00000002 00000004 00000004 00000000
nouveau: ch0: psh 00000000 000000012c 0000000140
nouveau:        0x0008fd6c
nouveau:        0x00000000
nouveau:        0x00000003
nouveau:        0x00020000
nouveau:        0x00000000
 (freeze)

-- Version 10.2.1 (reverted patch):
VENDOR
nouveau
RENDERER
Gallium 0.4 on NV34
VERSION
1.5 Mesa 10.2.1
TFP
TRUE

-- Last revision before bisected commit:
VENDOR
nouveau
RENDERER
Gallium 0.4 on NV34
VERSION
1.5 Mesa 10.2.0-devel (git-5bf90cb)
TFP
TRUE

-- Bisected commit:
nouveau: kernel rejected pushbuf: Bad file descriptor
nouveau: ch0: krec 0 pushes 1 bufs 1 relocs 0
nouveau: ch0: buf 00000000 00000002 00000004 00000004 00000000
nouveau: ch0: psh 00000000 000000012c 0000000140
nouveau:        0x0008fd6c
nouveau:        0x00000000
nouveau:        0x00000003
nouveau:        0x00020000
nouveau:        0x00000000
 (freeze)

Comment 17 Natrio 2014-06-16 08:30:41 UTC

Sorry, glxtest is from bug #75279, of course.

Comment 18 Ilia Mirkin 2014-06-16 18:12:12 UTC

What happens if you add

screen->base.pushbuf->user_priv = NULL

right before the if (screen->base.fence.current) check in nv30_screen_destroy?

Comment 19 Natrio 2014-06-17 08:52:50 UTC

(In reply to comment #18)
> What happens if you add
> 
> screen->base.pushbuf->user_priv = NULL
> 
> right before the if (screen->base.fence.current) check in
> nv30_screen_destroy?

No effect.

After llvm,llvm-libs update 3.4.1 -> 3.4.2 also glxgears starting to freeze instead of normal exit by ESC:

Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
2469 frames in 5.0 seconds = 493.688 FPS
2476 frames in 5.0 seconds = 495.177 FPS
nouveau: kernel rejected pushbuf: Bad file descriptor
nouveau: ch0: krec 0 pushes 1 bufs 1 relocs 0
nouveau: ch0: buf 00000000 00000004 00000004 00000004 00000000
nouveau: ch0: psh 00000000 0000025e84 0000025e98
nouveau:        0x0008fd6c
nouveau:        0x00000000
nouveau:        0x00001437
nouveau:        0x00020000
nouveau:        0x00000000
 (freeze)

Comment 20 Ilia Mirkin 2014-06-18 02:45:04 UTC

(In reply to comment #19)
> (In reply to comment #18)
> > What happens if you add
> > 
> > screen->base.pushbuf->user_priv = NULL
> > 
> > right before the if (screen->base.fence.current) check in
> > nv30_screen_destroy?
> 
> No effect.

Well, I've been unable to reproduce the "kernel rejected pushbuf" + hang behaviour, although I've only been testing with a NV43.

However I did observe the issue that I thought I would observe due to that line not being there... here's a slightly more correct version than my previous suggestion:

http://patchwork.freedesktop.org/patch/28050/

And another patch which I think is unlikely to affect what you're seeing, but who knows:

http://patchwork.freedesktop.org/patch/28049/

Assuming these don't help, could you run that glxtest application inside valgrind, and let me know if you see any illegal accesses. (Note that the first one of those patches will in fact prevent one such illegal access.)

Comment 21 Natrio 2014-06-18 07:23:09 UTC

Both patches are not works for me.

See valgrind log in attachment.

Comment 22 Natrio 2014-06-18 07:51:22 UTC

Created attachment 101283 [details]
Valgrind glxtest logs

Valgrind glxtest logs:
10.2.1 (clear);
10.2.1-reverted (without bisectred patch);
10.2.1-bufctx (avoid dangling... patch);
10.2.1-leaks (plug some memory leaks... patch);
10.2.1-bufctx_leaks (both patches)

Comment 23 Ilia Mirkin 2014-06-18 22:23:08 UTC

OK, well, I can't repro this, but clearly the device is getting closed _somehow_. It appears that you're relatively able at debugging things, let me know if I've misunderstood.

Can you load up glxtest in gdb (with my fixes), and do

break main
r
break close
break nv30_screen_destroy
c

and look at what happens. The first couple of closes will actually be at creation time because it opens/closes some config files, continue past those.

This is what I see:

(gdb) c
Continuing.

Breakpoint 2, nv50_screen_destroy (pscreen=0x633c60) at nv50/nv50_screen.c:295
295        struct nv50_screen *screen = nv50_screen(pscreen);
(gdb) n
297        if (!nouveau_drm_screen_unref(&screen->base))
(gdb) p screen->base.device->fd
$1 = 8
(gdb) c
Continuing.

Breakpoint 3, 0x00007ffff6db23d0 in close () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6db23d0 in close () from /lib64/libc.so.6
#1  0x00007ffff643797a in dri2DestroyScreen (base=0x618c50) at dri2_glx.c:712
#2  0x00007ffff63fcdb5 in FreeScreenConfigs (priv=0x615600) at glxext.c:208
#3  0x00007ffff63fce3f in glx_display_free (priv=0x615600) at glxext.c:231
#4  0x00007ffff63fd01a in __glXCloseDisplay (dpy=0x6070f0, codes=0x615688)
    at glxext.c:279
#5  0x00007ffff78bc0c2 in XCloseDisplay () from /usr/lib64/libX11.so.6
#6  0x00000000004011be in glxtest () at glxtest.cpp:202
#7  0x000000000040120a in main (argc=1, argv=0x7fffffffdda8) at glxtest.cpp:211
(gdb) up
#1  0x00007ffff643797a in dri2DestroyScreen (base=0x618c50) at dri2_glx.c:712
712        close(psc->fd);
(gdb) p psc->fd
$2 = 8
(gdb) c
Continuing.

Breakpoint 3, 0x00007ffff6db23d0 in close () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff6db23d0 in close () from /lib64/libc.so.6
#1  0x00007ffff6abddee in xcb_disconnect () from /usr/lib64/libxcb.so.1
#2  0x00007ffff78bc0f7 in XCloseDisplay () from /usr/lib64/libX11.so.6
#3  0x00000000004011be in glxtest () at glxtest.cpp:202
#4  0x000000000040120a in main (argc=1, argv=0x7fffffffdda8) at glxtest.cpp:211
(gdb) c

[this is actually with nv50, but the nv30 flow should be no different]

You must be getting something else, otherwise how is that fd getting closed already.

Comment 24 Natrio 2014-06-19 07:39:25 UTC

Created attachment 101333 [details]
GDB log witn reverted patch

GDB log witn reverted (bisected) patch

Comment 25 Natrio 2014-06-19 07:55:47 UTC

Created attachment 101336 [details]
GDB log witn clear mesa-10.2.1

 Before close:
$ ls -l /proc/7967/fd/
total 0
lrwx------ 1 user users 64 Jun 19 11:41 0 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 11:41 1 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 11:41 2 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 11:41 3 -> socket:[899028]
lrwx------ 1 user users 64 Jun 19 11:41 4 -> /dev/dri/card0
lrwx------ 1 user users 64 Jun 19 11:41 5 -> /dev/dri/card0
 after close:
$ ls -l /proc/7967/fd/
total 0
lrwx------ 1 user users 64 Jun 19 11:41 0 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 11:41 1 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 11:41 2 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 11:41 3 -> socket:[899028]
lrwx------ 1 user users 64 Jun 19 11:41 5 -> /dev/dri/card0

Comment 26 Natrio 2014-06-19 08:08:42 UTC

Created attachment 101338 [details]
GDB log with "avoid dangling..." patch

/dev/dri/card0 on fd #4 closed,
/dev/dri/card0 on fd #5 remained open.

Comment 27 Ilia Mirkin 2014-06-19 08:18:30 UTC

(In reply to comment #26)
> Created attachment 101338 [details]
> GDB log with "avoid dangling..." patch
> 
> /dev/dri/card0 on fd #4 closed,
> /dev/dri/card0 on fd #5 remained open.

So... for some *unknown* reason, the thing decides to create two screens. Fine. Now we have this thing which reuses screens for the same device... makes sense. But after it tries to close one of the screens, it also closes the fd that it used to create the screen with. Reasonable enough. However that fd is the one that's being used by the device.

Perhaps it should be dup'ing the fd and doing all the ioctl's on that, so that this sort of thing doesn't happen. Maarten, thoughts?

Comment 28 Natrio 2014-06-19 08:18:43 UTC

Created attachment 101339 [details]
GDB log with both patches

All as previous two passes – device on fd #5 still opened after freeze.

Comment 29 Natrio 2014-06-19 08:23:18 UTC

But with reverted (bisected) patch both fd #4,#5 opened and closed successfully.

Comment 30 Ilia Mirkin 2014-06-19 08:40:24 UTC

Mind testing out http://patchwork.freedesktop.org/patch/28219/ ?

Comment 31 Natrio 2014-06-19 09:38:01 UTC

Created attachment 101350 [details]
GDB log with "dup fd..." patch

It works!

 Before first close:
$ ls -l /proc/29981/fd/
total 0
lrwx------ 1 user users 64 Jun 19 13:20 0 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 1 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 2 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 3 -> socket:[1333878]
lrwx------ 1 user users 64 Jun 19 13:20 4 -> /dev/dri/card0
lrwx------ 1 user users 64 Jun 19 13:20 5 -> /dev/dri/card0
lrwx------ 1 user users 64 Jun 19 13:20 6 -> /dev/dri/card0

 After first close:
$ ls -l /proc/29981/fd/
total 0
lrwx------ 1 user users 64 Jun 19 13:20 0 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 1 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 2 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 3 -> socket:[1333878]
lrwx------ 1 user users 64 Jun 19 13:20 5 -> /dev/dri/card0
lrwx------ 1 user users 64 Jun 19 13:20 6 -> /dev/dri/card0

 After second close:
$ ls -l /proc/29981/fd/
total 0
lrwx------ 1 user users 64 Jun 19 13:20 0 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 1 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 2 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 3 -> socket:[1333878]
lrwx------ 1 user users 64 Jun 19 13:20 6 -> /dev/dri/card0

 After third close:
$ ls -l /proc/29981/fd/
total 0
lrwx------ 1 user users 64 Jun 19 13:20 0 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 1 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 2 -> /dev/pts/0
lrwx------ 1 user users 64 Jun 19 13:20 3 -> socket:[1333878]

Comment 32 Maarten Lankhorst 2014-06-19 11:48:39 UTC

*** Bug 79946 has been marked as a duplicate of this bug. ***

Comment 33 Ilia Mirkin 2014-06-19 17:47:19 UTC

(In reply to comment #31)
> Created attachment 101350 [details]
> GDB log with "dup fd..." patch
> 
> It works!

Great! Also I just figured out why this is happening to you and not to anyone else. It's the dual screens! (Who uses separate screens anyways...)

src/glx/glxext.c allocates a screen per... screen. And then when it cleans up, it deletes the screens in the same order it created them. So it's precisely the scenario I outlined, where by the time the second screen goes to be destroyed, the fd it relies on (which is the first one) has already been closed.

I'll push this patch out and tag it for 10.2.

Comment 34 Ilia Mirkin 2014-06-24 13:33:47 UTC

A (very slightly) improved version of the patch is now in mesa master. Hopefully it'll get backported to 10.2.2, but if not, it'll be in 10.2.3.

Comment 35 Natrio 2014-06-25 05:34:31 UTC

(In reply to comment #34)
> A (very slightly) improved version of the patch is now in mesa master.
> Hopefully it'll get backported to 10.2.2, but if not, it'll be in 10.2.3.

An improved patch
http://cgit.freedesktop.org/mesa/mesa/commit/?id=a59f2bb17bcc78e09653391748549e7973990798
was tested and also works for me.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.