Bug 49973 - colord-sane: segfault with error 6 in libdbus-1.so.3.7.0
Summary: colord-sane: segfault with error 6 in libdbus-1.so.3.7.0
Status: RESOLVED NOTOURBUG
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.5
Hardware: All All
: medium normal
Assignee: Havoc Pennington
QA Contact: John (J5) Palmieri
URL: http://bugs.debian.org/cgi-bin/bugrep...
Whiteboard:
Keywords: have-backtrace
Depends on:
Blocks:
 
Reported: 2012-05-15 13:48 UTC by Paul Menzel
Modified: 2014-04-30 18:08 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
logs for shutdown hanging and the start afterwards (121.87 KB, text/plain)
2013-04-07 18:51 UTC, Valentin Brasov
Details

Description Paul Menzel 2012-05-15 13:48:55 UTC
Marcos and I see several crashes of colord as reported in the Debian BTS already.

        $ zgrep segfault /var/log/syslog* # output edited
        /var/log/syslog: joey kernel: [11002.555364] colord-sane[5803]: segfault at 4 ip 00000000f571d482 sp 00000000f729cfb8 error 6 in libdbus-1.so.3.7.0[f56f2000+49000]
        /var/log/syslog.0: joey kernel: [12857.345724] colord-sane[3014]: segfault at 19 ip 00000000f562748a sp 00000000f71a1fb8 error 6 in libdbus-1.so.3.7.0[f55fc000+49000]
        /var/log/syslog.0: joey kernel: [11557.500247] colord-sane[2850]: segfault at 21 ip 00000000f572748a sp 00000000f729ffb8 error 6 in libdbus-1.so.3.7.0[f56fc000+49000]
        /var/log/syslog.1.gz: joey kernel: [12941.969023] colord-sane[2934]: segfault at 2c ip 00000000f564a482 sp 00000000f71c6fb8 error 6 in libdbus-1.so.3.7.0[f561f000+49000]
        /var/log/syslog.1.gz: joey kernel: [25366.027574] colord-sane[2854]: segfault at 4 ip 00000000f4e03482 sp 00000000f71abfb8 error 6 

I had similar problems with packagekitd reported and fixed in
#658293 [2]. I was told my D-Bus setup is messed up. But I was not told
what might be wrong and still programs should not crash because of that.
I am very interested how my D-Bus setup is wrong and how I can remedy
that. In the meanwhile it would be great if colord could be fixed.

Also I tried to reproduce that error following the instructions of the colord page [3]. But I cannot reproduce that segfault since colord exits right away.

$ /usr/lib/i386-linux-gnu/colord/colord --verbose
04:32:57	Verbose debugging enabled (on console 1)
04:32:57	CdMappingDb: trying to open database '/var/lib/colord/mapping.db'
04:32:57	CdDeviceDb: trying to open database '/var/lib/colord/storage.db'
04:32:57	CdMain: lost name: org.freedesktop.ColorManager

$ /usr/lib/i386-linux-gnu/colord/colord-sane --verbose
$ echo $?
0


[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=668225
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=658293
[3] http://www.freedesktop.org/software/colord/bugs.html
Comment 1 Richard Hughes 2012-05-16 00:48:42 UTC
(In reply to comment #0)
> I was told my D-Bus setup is messed up. But I was not told
> what might be wrong and still programs should not crash because of that.

Yes, they should. Frameworks that map core libraries like libdbus-1.so.3.7.0 in process can't do *anything* if something inside the linked library crashes. It's just not possible to recover on a UNIX-like shared library model.

> I am very interested how my D-Bus setup is wrong and how I can remedy
> that.

DBus has to work correctly, or a lot of things are going to crash.

> In the meanwhile it would be great if colord could be fixed.

I really don't think this is a colord problem. It's segfaulting inside libdbus-1.so.3.7.0...
Comment 2 Paul Menzel 2012-05-16 03:04:09 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > I was told my D-Bus setup is messed up. But I was not told
> > what might be wrong and still programs should not crash because of that.
> 
> Yes, they should. Frameworks that map core libraries like libdbus-1.so.3.7.0 in
> process can't do *anything* if something inside the linked library crashes.
> It's just not possible to recover on a UNIX-like shared library model.

You are right. I mixed that up.

> > I am very interested how my D-Bus setup is wrong and how I can remedy
> > that.
>
> DBus has to work correctly, or a lot of things are going to crash.
>
> > In the meanwhile it would be great if colord could be fixed.
>
> I really don't think this is a colord problem. It's segfaulting inside
> libdbus-1.so.3.7.0...

True again. I therefore reopen this ticket and change the product to D-Bus.
Comment 3 Paul Menzel 2012-05-16 03:06:15 UTC
Here is some version information.

$ dpkg -l libdbus-1-3
[…]
ii  libdbus-1-3:i386              1.5.12-1                      simple interprocess messaging system (library)
$ dpkg -L libdbus-1-3
/.
/lib
/lib/i386-linux-gnu
/lib/i386-linux-gnu/libdbus-1.so.3.7.0
[…]

Please tell me what other information I can provide.
Comment 4 Paul Menzel 2012-05-16 04:01:10 UTC
Ok, in IRC channel #dbus (IRC) I was told to capture a backtrace. Since the segfaults mostly happen during start up I will have to figure out a way to start colord-sane automatically from GDB. Or a way to get a coredump I guess.
Comment 5 Paul Menzel 2012-07-04 03:48:28 UTC
On June 6th DBus was upgraded to 1.6.0-1.

        libdbus-1-3:i386 1.5.12-1 -> 1.6.0-1

The crash in libdbus-1.so.3.7.1 was still there.

        /var/log/syslog.6.gz:Jun 27 [ 7984.923591] colord-sane[3507]: segfault at 19 ip b55de34a sp b55b1fb8 error 6 in libdbus-1.so.3.7.1[b55b3000+49000]
        /var/log/syslog.5.gz:Jun 27 [ 1837.738688] colord-sane[3510]: segfault at 19 ip b56b834a sp b568bfb8 error 6 in libdbus-1.so.3.7.1[b568d000+49000]
        /var/log/syslog.5.gz:Jun 27 [  965.881639] colord-sane[3518]: segfault at 19 ip b560734a sp b7167fb8 error 6 in libdbus-1.so.3.7.1[b55dc000+49000]
        /var/log/syslog.5.gz:Jun 27 [ 3660.161146] colord-sane[3491]: segfault at 20474e4d ip b565c342 sp b562ffb8 error 6 in libdbus-1.so.3.7.1[b5631000+49000]
        /var/log/syslog.5.gz:Jun 27 [ 1545.183620] colord-sane[3499]: segfault at 19 ip b568c34a sp b565ffb8 error 6 in libdbus-1.so.3.7.1[b5661000+49000]
        /var/log/syslog.4.gz:Jun 29 [11734.265144] colord-sane[3502]: segfault at 4 ip b55db342 sp b713bfb8 error 6 in libdbus-1.so.3.7.1[b55b0000+49000]

After this `libdbus-1-3` was upgraded to 1.6.2-1 and I have not seen the crash since. There is a new crash though now.

        /var/log/syslog.1.gz:Jul  3 00:48:31 mattotaupa kernel: [19959.302068] colord-sane[3119]: segfault at 0 ip b736a935 sp bfee3c8c error 4 in libc-2.13.so[b72d4000+156000]

So it looks like this has been fixed in DBus.
Comment 6 Simon McVittie 2012-07-19 10:54:39 UTC
(In reply to comment #4)
> Ok, in IRC channel #dbus (IRC) I was told to capture a backtrace. Since the
> segfaults mostly happen during start up I will have to figure out a way to
> start colord-sane automatically from GDB. Or a way to get a coredump I guess.

Yes, please. There isn't enough information here to locate the bug.

Richard: perhaps you can suggest how to obtain a stack trace from colord-sane?
Comment 7 Paul Menzel 2012-07-25 09:02:58 UTC
(In reply to comment #6)
> (In reply to comment #4)
> > Ok, in IRC channel #dbus (IRC) I was told to capture a backtrace. Since the
> > segfaults mostly happen during start up I will have to figure out a way to
> > start colord-sane automatically from GDB. Or a way to get a coredump I guess.
> 
> Yes, please. There isn't enough information here to locate the bug.
> 
> Richard: perhaps you can suggest how to obtain a stack trace from colord-sane?

1. I guess this is due to something, that D-Bus is not started fully in the beginning, because it is only happens during start up.

2. I tried to set up my system to do core dumps.

        $ more /etc/limits.conf 
        *       C2048000
        $ grep unlimited /etc/security/limits.conf
        *             soft      core       unlimited

But that does not seem to work since there is no message »Core dumped«.

    colord-sane[3657]: segfault at 4 ip b5614362 sp b7171fb8 error 6 in libdbus-1.so.3.7.2[b55e9000+49000]

At least I do not know where to look for the dumps if they are created.

3. Simon, is there any way to get to the debugging symbols without building the D-Bus package myself? If you have them on your system, you should be able to do

        $ echo 49000 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2

or the other hex number to find out where it segfaults.
Comment 8 Simon McVittie 2012-07-26 10:42:21 UTC
(In reply to comment #7)
> 3. Simon, is there any way to get to the debugging symbols without building the
> D-Bus package myself?

apt-get install dbus-1-dbg

> If you have them on your system, you should be able to do
> 
>         $ echo 49000 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2
> 
> or the other hex number to find out where it segfaults.

A full backtrace would be much more useful than the segfaulting line, but yes, that's a good idea. I'll try that when I have some time to look into this bug. (Anyone else interested in this bug: any extra information would be welcome).
Comment 9 Paul Menzel 2012-07-26 13:00:48 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > 3. Simon, is there any way to get to the debugging symbols without building the
> > D-Bus package myself?
> 
> apt-get install dbus-1-dbg
> 
> > If you have them on your system, you should be able to do
> > 
> >         $ echo 49000 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2
> > 
> > or the other hex number to find out where it segfaults.
> 
> A full backtrace would be much more useful than the segfaulting line,

Alright. Next thing I try and I hope will hopefully get a backtrace is changing in

    /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service

the `Exec` call to the following.

    Exec=ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane

Hopefully that will get us a core dump.

> but yes, that's a good idea. I'll try that when I have some time to look into this bug.
> (Anyone else interested in this bug: any extra information would be welcome).

Since I have no clue about all this I found the following two Web pages quite useful. Although both state that such things happening in shared libraries makes it more difficult to debug. But the first one offers a tutorial for that anyhow.

[1] http://coredump.io/blog/2012/02/23/debugging-segfaults-from-logs-to-gdb/
[2] http://stackoverflow.com/questions/2549214/interpreting-segfault-messages
Comment 10 Paul Menzel 2012-07-27 14:12:25 UTC
(In reply to comment #9)
> (In reply to comment #8)

[…]

> > A full backtrace would be much more useful than the segfaulting line,
> 
> Alright. Next thing I try and I hope will hopefully get a backtrace is changing
> in
> 
>     /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service
> 
> the `Exec` call to the following.
> 
>     Exec=ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane
> 
> Hopefully that will get us a core dump.

Unfortunately that does not work.

    dbus[1807]: [system] Activating service name='org.freedesktop.colord-sane' (using servicehelper)
    dbus[1807]: [system] Activated service 'org.freedesktop.colord-sane' failed: Cannot launch daemon, file not found or permissions invalid

Any hints on how to start

    /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service

manually with some `dbus` foo?

[…]
Comment 11 Paul Menzel 2012-09-13 21:49:42 UTC
As an update, on another system the crash happens when the system is shutting down.

    colord-sane[2629]: segfault at 8 ip 00007f410489f66c sp 00007f4104875b78 error 6 in libdbus-1.so.3.7.2[7f4104877000+44000]
Comment 12 Paul Menzel 2012-09-13 21:51:22 UTC
For those not knowing already, the difference of +49000 and + 44000 seems to origin in the used architecture (i386 vs. amd64).
Comment 13 Paul Menzel 2012-09-14 09:49:28 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > 3. Simon, is there any way to get to the debugging symbols without building the
> > > D-Bus package myself?
> > 
> > apt-get install dbus-1-dbg
> > 
> > > If you have them on your system, you should be able to do
> > > 
> > >         $ echo 49000 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2
> > > 
> > > or the other hex number to find out where it segfaults.
> > 
> > A full backtrace would be much more useful than the segfaulting line,

[ Unsuccessful in getting a backtrace. ]

> > but yes, that's a good idea. I'll try that when I have some time to look into this bug.
> > (Anyone else interested in this bug: any extra information would be welcome).
> 
> Since I have no clue about all this I found the following two Web pages quite
> useful. Although both state that such things happening in shared libraries
> makes it more difficult to debug. But the first one offers a tutorial for that
> anyhow.

With the help of thiago in #dbus on irc.freenode.net, I found out the following.

1. thiago looking into the Linux source code

                         printk("%s%s[%lx+%lx]", prefix, p,
                                         vma->vm_start,
                                         vma->vm_end - vma->vm_start);

he found out that in the line

        colord-sane[3657]: segfault at 4 ip b5614362 sp b7171fb8 error 6 in libdbus-1.so.3.7.2[b55e9000+49000]

we are interested in the address after `ip`. To get the pointer, we calculate the difference b5614362 - b55e9000, which is 0x2b362.

`addr2line` still does not give us any code line.

    $ echo 0x2b362 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2 
    ??:0
    $ echo 2b362 | addr2line -e /lib/i386-linux-gnu/libdbus-1.so.3.7.2 
    ??:0

But using GDB we get the following.

    $ gdb -q /lib/i386-linux-gnu/libdbus-1.so.3.7.2 
    Reading symbols from /lib/i386-linux-gnu/libdbus-1.so.3.7.2...Reading symbols from /usr/lib/debug/.build-id/f0/8f17cd1f791e9fad9f8528f7146a467c3736b5.debug...done.
    done.
    (gdb) p/a 0x2B362
    $1 = 0x2b362 <_dbus_list_unlink+18>
    (gdb) l *0x2B362
    0x2b362 is in _dbus_list_unlink (../../dbus/dbus-list.c:498).
    493	../../dbus/dbus-list.c: Datei oder Verzeichnis nicht gefunden.

> [1] http://coredump.io/blog/2012/02/23/debugging-segfaults-from-logs-to-gdb/
> [2] http://stackoverflow.com/questions/2549214/interpreting-segfault-messages
Comment 14 Paul Menzel 2012-09-14 09:53:16 UTC
Calculating all the locations, it looks like it happens in two different places.

libdbus-1.so.3.7.0 (i386 userspace with amd64 Linux kernel)
00000000f571d482-f56f2000 2B482
00000000f562748a-f55fc000 2B48A
00000000f564a482-f561f000 2B482

libdbus-1.so.3.7.1 (i386 userspace and Linux kernel)
b55de34a-b55b3000 2B34A
b56b834a-b568d000 2B34A
b560734a-b55dc000 2B34A
b565c342-b5631000 2B342
b568c34a-b5661000 2B34A
b55db342-b55b0000 2B342

libdbus-1.so.3.7.2 (i386 userspace and Linux kernel)
b5614362-b55e9000 2B362

libdbus-1.so.3.7.2 (amd64 userspace and Linux kernel)
00007f410489f66c-7f4104877000 2866C


Probably a script could be written to do that automatically using bc. Please note that bc needs all letters for hex numbers in upper case [1]. See `man bash` for that.

$ bc
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
ibase=16
F+F
30
obase=F
F+F
20
obase=10
F+F
1E
00000000f564a482-f561f000
(standard_in) 7: syntax error
00000000F564A482-F561F000
2B482

[1] http://doc-tcpip.org/Allgemein/bc.convert.html
Comment 15 Paul Menzel 2012-10-02 11:13:02 UTC
It also fails with the latest packages of version 1.6.8-1 in Debian Sid/unstable [1].

    [ 2599.845031] colord-sane[2680]: segfault at 41 ip 00007f13d9b347d7 sp 00007f13cc9ceb78 error 6 in libdbus-1.so.3.7.2[7f13d9b0c000+44000]


[1] http://packages.debian.org/changelogs/pool/main/d/dbus/dbus_1.6.8-1/changelog
Comment 16 Simon McVittie 2012-10-02 17:14:10 UTC
Recent colord uploads to Debian:

http://packages.qa.debian.org/c/colord.html

suggest that this might have been a colord bug all along: 

   * debian/patches/fix_colord-sane_dbus_threading.patch:
     - Initialise libdbus threadsafety in colord-sane. (Probably) fixes multiple
       crashes from colord-sane in libdbus (LP: #844286). Plausibly fixes other
       bad colord-sane behaviour, such as 100% CPU usage (Closes: 668325)

libdbus isn't thread-safe unless you tell it to be.

Unfortunately, we can't make it thread-safe-by-default[1], because it supports running on systems where it might run out of memory and see malloc() returning NULL, and becoming thread-safe requires allocating memory (so can fail).

[1] or at least, not in a nice way: some entry points have no way to report an error, so if allocating memory for mutexes failed at one of those entry points, it'd have to either block until more memory was available, or abort()
Comment 17 Simon McVittie 2012-10-02 17:14:55 UTC
(In reply to comment #16)
> Recent colord uploads to Debian:
> 
> http://packages.qa.debian.org/c/colord.html
> 
> suggest that this might have been a colord bug all along

Forgot to say: so please try with colord (>= 0.1.21-3) and see if that fixes it.
Comment 18 Paul Menzel 2012-10-02 21:01:04 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Recent colord uploads to Debian:
> > 
> > http://packages.qa.debian.org/c/colord.html
> > 
> > suggest that this might have been a colord bug all along
> 
> Forgot to say: so please try with colord (>= 0.1.21-3) and see if that fixes
> it.

I forgot to say, that I had these updated packages already installed. The result is that the segmentation fault (as pasted above) now happens during shutdown and not during the start.
Comment 19 Simon McVittie 2012-10-03 07:13:42 UTC
For best results, have debugging symbols for at least libdbus and colord (install dbus-1-dbg, and rebuild colord with debug symbols as per <http://wiki.debian.org/HowToGetABacktrace> since it doesn't seem to have a corresponding -dbg package).

(In reply to comment #9)
> Alright. Next thing I try and I hope will hopefully get a backtrace is
> changing in
> 
>     /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service
> 
> the `Exec` call to the following.
> 
>     Exec=ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane

I think D-Bus requires the Exec in the .service file to be a (full path to a) command + arguments, so you could use:

    Exec=/bin/sh -c "ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane"

Having obtained a core, you can turn that into a backtrace with:

    gdb /usr/lib/*/colord/colord-sane core
    (at the gdb prompt) thread apply all bt full

(In reply to comment #14)
> libdbus-1.so.3.7.0 (i386 userspace with amd64 Linux kernel)
> 00000000f571d482-f56f2000 2B482

I'm afraid this is meaningless to us; even if we knew which function this was in (which in principle we could work out with objdump or something), we also need to know which function it was called from, which function called *that*, and so on. That's the information a full backtrace would give us.
Comment 20 Paul Menzel 2012-10-03 14:54:34 UTC
(In reply to comment #19)
> For best results, have debugging symbols for at least libdbus and colord
> (install dbus-1-dbg, and rebuild colord with debug symbols as per
> <http://wiki.debian.org/HowToGetABacktrace> since it doesn't seem to have a
> corresponding -dbg package).

Sure.

> (In reply to comment #9)
> > Alright. Next thing I try and I hope will hopefully get a backtrace is
> > changing in
> > 
> >     /usr/share/dbus-1/system-services/org.freedesktop.colord-sane.service
> > 
> > the `Exec` call to the following.
> > 
> >     Exec=ulimit -c unlimited && /usr/lib/i386-linux-gnu/colord/colord-sane
> 
> I think D-Bus requires the Exec in the .service file to be a (full path to
> a) command + arguments, so you could use:
> 
>     Exec=/bin/sh -c "ulimit -c unlimited &&
> /usr/lib/i386-linux-gnu/colord/colord-sane"

Thank you very much. On one system this seems to work. On a different one I got the following.

dbus[1980]: [system] Activating service name='org.freedesktop.ColorManager' (using servicehelper)
dbus[1980]: [system] Successfully activated service 'org.freedesktop.ColorManager'
dbus[1980]: [system] Activating service name='org.freedesktop.colord-sane' (using servicehelper)
dbus[1980]: [system] Activated service 'org.freedesktop.colord-sane' failed: Launch helper exited with unknown return code 127

This is on a SSD system. I probably need to pass some debug options to get more information.

> Having obtained a core, you can turn that into a backtrace with:
> 
>     gdb /usr/lib/*/colord/colord-sane core
>     (at the gdb prompt) thread apply all bt full
> 
> (In reply to comment #14)
> > libdbus-1.so.3.7.0 (i386 userspace with amd64 Linux kernel)
> > 00000000f571d482-f56f2000 2B482
> 
> I'm afraid this is meaningless to us; even if we knew which function this
> was in (which in principle we could work out with objdump or something), we
> also need to know which function it was called from, which function called
> *that*, and so on. That's the information a full backtrace would give us.

Sure. Unfortunately nothing could be captured until now. :/ Thanks for the help and hopefully I will succeed in getting a backtrace.
Comment 21 Simon McVittie 2012-10-03 18:15:14 UTC
(In reply to comment #20)
> dbus[1980]: [system] Activated service 'org.freedesktop.colord-sane' failed:
> Launch helper exited with unknown return code 127

Shells exit with 127 when they can't find an executable. Is this other system perhaps running amd64 userland rather than i386, so it has /usr/lib/x86_64-linux-gnu/colord/colord-sane instead?
Comment 22 Paul Menzel 2012-10-03 19:43:30 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > dbus[1980]: [system] Activated service 'org.freedesktop.colord-sane' failed:
> > Launch helper exited with unknown return code 127
> 
> Shells exit with 127 when they can't find an executable. Is this other
> system perhaps running amd64 userland rather than i386, so it has
> /usr/lib/x86_64-linux-gnu/colord/colord-sane instead?

Spot on! I will make that change and tomorrow we will see if it will work. Thanks!
Comment 23 Paul Menzel 2012-10-07 06:05:40 UTC
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #20)
> > > dbus[1980]: [system] Activated service 'org.freedesktop.colord-sane' failed:
> > > Launch helper exited with unknown return code 127
> > 
> > Shells exit with 127 when they can't find an executable. Is this other
> > system perhaps running amd64 userland rather than i386, so it has
> > /usr/lib/x86_64-linux-gnu/colord/colord-sane instead?
> 
> Spot on! I will make that change and tomorrow we will see if it will work.
> Thanks!

Alright. First I could not find any core files. Probably that was due to that `/usr/lib/x86_64-linux-gnu/colord/` is only writable by root. `man core` gives some hints how to change the location and luckily on this system `/tmp` is on the disk and not in RAM, so

    echo "/tmp/core.%p.%E" | sudo tee /proc/sys/kernel/core_pattern

is one solution to get the core dump file saved.
Comment 24 Paul Menzel 2012-10-07 06:07:00 UTC
Thread 3 (Thread 0x7f70ea836700 (LWP 2247)):
#0  0x00007f70ec47fac3 in *__GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
        _a3 = -1
        _a1 = 140122838279216
        resultvar = <optimized out>
        _a2 = 3
        oldtype = 0
        result = <optimized out>
#1  0x00007f70ecbe5624 in g_main_context_poll (n_fds=3, fds=0x7f70e4002430, timeout=-1, context=0x15ff640, priority=<optimized out>)
    at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3440
        poll_func = 0x7f70ecbf3410 <g_poll>
#2  g_main_context_iterate (context=0x15ff640, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3141
        max_priority = 2147483647
        timeout = -1
        some_ready = <optimized out>
        nfds = 3
        allocated_nfds = <optimized out>
        fds = 0x7f70e4002430
#3  0x00007f70ecbe5a82 in g_main_loop_run (loop=0x15ff5f0) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3340
        __PRETTY_FUNCTION__ = "g_main_loop_run"
#4  0x00007f70ed5bba86 in gdbus_shared_thread_func (user_data=0x15ff610)
    at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./gio/gdbusprivate.c:277
        data = 0x15ff610
#5  0x00007f70ecc07f45 in g_thread_proxy (data=0x15f8370) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gthread.c:801
        thread = 0x15f8370
#6  0x00007f70ec73fb50 in start_thread (arg=<optimized out>) at pthread_create.c:304
        __res = <optimized out>
        pd = 0x7f70ea836700
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140122947544832, -509398390679298109, 140122955934736, 140122947545536, 
                140123008782400, 3, 436732454272212931, 436735068657655747}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#7  0x00007f70ec48a70d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
No locals.
#8  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 0x7f70ee2797a0 (LWP 2233)):
#0  0x00007f70ec47fac3 in *__GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:87
        _a3 = -1
        _a1 = 23022384
        resultvar = <optimized out>
        _a2 = 2
        oldtype = 0
        result = <optimized out>
#1  0x00007f70ecbe5624 in g_main_context_poll (n_fds=2, fds=0x15f4b30, timeout=-1, context=0x15e6530, priority=<optimized out>)
    at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3440
        poll_func = 0x7f70ecbf3410 <g_poll>
#2  g_main_context_iterate (context=0x15e6530, block=block@entry=1, dispatch=dispatch@entry=1, 
    self=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3141
        max_priority = 2147483647
        timeout = -1
        some_ready = <optimized out>
        nfds = 2
        allocated_nfds = <optimized out>
        fds = 0x15f4b30
#3  0x00007f70ecbe5a82 in g_main_loop_run (loop=0x15e64b0) at /tmp/buildd/glib2.0-2.33.12+really2.32.4/./glib/gmain.c:3340
        __PRETTY_FUNCTION__ = "g_main_loop_run"
#4  0x00000000004021fa in main ()
No symbol table info available.

Thread 1 (Thread 0x7f70db77b700 (LWP 2665)):
#0  0x00007f70ec97d7cc in _dbus_list_unlink (list=list@entry=0x7f70db77aba8, link=link@entry=0x16b8180) at ../../dbus/dbus-list.c:498
No locals.
#1  0x00007f70ec97d9f9 in _dbus_list_pop_first_link (list=list@entry=0x7f70db77aba8) at ../../dbus/dbus-list.c:625
        link = 0x16b8180
#2  0x00007f70ec961e5b in _dbus_connection_unlock (connection=connection@entry=0x7f70ec734eb8) at ../../dbus/dbus-connection.c:424
        expired_messages = 0x16b8180
        iter = <optimized out>
#3  0x00007f70ec962929 in dbus_connection_get_is_connected (connection=0x7f70ec734eb8) at ../../dbus/dbus-connection.c:2936
        res = 1
        __FUNCTION__ = "dbus_connection_get_is_connected"
#4  0x00007f70db781096 in avahi_client_is_connected (client=0x16991e0) at client.c:901
No locals.
#5  0x00007f70db785134 in avahi_service_browser_free (b=0x169bee0) at browser.c:698
        client = 0x16991e0
        r = 0
        __PRETTY_FUNCTION__ = "avahi_service_browser_free"
#6  0x00007f70dbba2668 in net_avahi_callback (c=0x16991e0, state=<optimized out>, userdata=<optimized out>) at net.c:854
        proto = <optimized out>
        error = -23
#7  0x00007f70db7814ae in filter_func (bus=<optimized out>, message=0x169ca40, userdata=0x16991e0) at client.c:300
        client = 0x16991e0
        error = {name = 0x0, message = 0x0, dummy1 = 1, dummy2 = 0, dummy3 = 0, dummy4 = 1, dummy5 = 1, padding1 = 0x7f70db77acf8}
        __PRETTY_FUNCTION__ = "filter_func"
#8  0x00007f70ec96553e in dbus_connection_dispatch (connection=0x1699810) at ../../dbus/dbus-connection.c:4631
        filter = <optimized out>
        next = 0x0
        message = 0x169ca40
        link = <optimized out>
        filter_list_copy = 0x1654640
        message_link = 0x1654778
        result = DBUS_HANDLER_RESULT_NOT_YET_HANDLED
        pending = <optimized out>
        reply_serial = <optimized out>
        status = <optimized out>
        found_object = <optimized out>
        __FUNCTION__ = "dbus_connection_dispatch"
#9  0x00007f70db7888b6 in dispatch_timeout_callback (t=<optimized out>, userdata=<optimized out>) at ../avahi-common/dbus-watch-glue.c:105
        d = 0x164f1d0
#10 0x00007f70db9963f0 in avahi_simple_poll_dispatch (s=0x1699130) at simple-watch.c:570
        next_timeout = 0x1699280
        w = <optimized out>
        __PRETTY_FUNCTION__ = "avahi_simple_poll_dispatch"
#11 0x00007f70db9966dd in avahi_simple_poll_loop (s=0x1699130) at simple-watch.c:646
        r = <optimized out>
        __PRETTY_FUNCTION__ = "avahi_simple_poll_loop"
#12 0x00007f70db99675c in thread (userdata=0x16990e0) at thread-watch.c:71
        p = 0x16990e0
        mask = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}}
#13 0x00007f70ec73fb50 in start_thread (arg=<optimized out>) at pthread_create.c:304
        __res = <optimized out>
        pd = 0x7f70db77b700
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140122695120640, -509398390679298109, 140122980131232, 140122695121344, 
                140123008782400, 3, 436838390177439683, 436735068657655747}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#14 0x00007f70ec48a70d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
No locals.
#15 0x0000000000000000 in ?? ()
No symbol table info available.
quit
Comment 25 Paul Menzel 2012-10-07 06:08:47 UTC
So looking at the trace (Why did Bugzilla not collapse it?), it looks like Avahi and libsane are involved.
Comment 26 Simon McVittie 2012-10-08 11:10:16 UTC
(In reply to comment #24)
> Thread 1 (Thread 0x7f70db77b700 (LWP 2665)):

This is presumably the thread that caught the segfault, since the others are blocking in poll() which isn't in libdbus.

> #0  0x00007f70ec97d7cc in _dbus_list_unlink (list=list@entry=0x7f70db77aba8,
> link=link@entry=0x16b8180) at ../../dbus/dbus-list.c:498
> No locals.

This line is:

    link->prev->next = link->next;

which can only fail if the linked list is somehow corrupt (libdbus uses circular linked lists, so prev and next are always non-NULL).

> #2  0x00007f70ec961e5b in _dbus_connection_unlock
> (connection=connection@entry=0x7f70ec734eb8) at
> ../../dbus/dbus-connection.c:424
>         expired_messages = 0x16b8180
>         iter = <optimized out>

Context: we're going through the list of "expired messages" (messages that can't safely be freed until the connection is unlocked) and freeing them all. We do this by moving the connection's list to a temporary variable while under the lock, setting the connection's list to NULL, unlocking, and then freeing the temporary list.

Other than that, the list is only changed by _dbus_connection_message_sent_unlocked, _dbus_connection_peer_filter_unlocked_no_update and dbus_connection_dispatch which are all under the lock.

Ideas, anyone?
Comment 27 Simon McVittie 2012-10-08 11:39:01 UTC
(In reply to comment #25)
> So looking at the trace (Why did Bugzilla not collapse it?), it looks like
> Avahi and libsane are involved.

AIUI, Bugzilla didn't collapse it because that isn't a standard feature, and freedesktop.org doesn't have all the same plugins as gnome.org.

Thread 2 in your backtrace is the main thread.

Thread 3 is the GDBus worker thread.

Thread 1 is libavahi-client (which uses libdbus) running in another worker thread. This appears to come from backend/net.c in sane-backends?

Looking at backend/net.c, I am very suspicious about the thread-safety of its Avahi interactions. It has a linked list of devices starting at

    static Net_Device *first_device;

which is written by add_device(), which appears to happen in the Avahi thread. It's also read by sane_exit(), sane_get_devices() and sane_open() with no apparent locking.

This is basically nothing to do with libdbus, but because the threads share a process-space, writing through a bad pointer anywhere in the process could have action-at-a-distance effects - so it seems worth looking into.

It might be worth running colord-sane under one of Valgrind's thread-debugging tools (Helgrind or DRD), which are designed to detect whether multi-threaded memory accesses are deterministic.
Comment 28 Simon McVittie 2012-12-06 18:05:01 UTC
(In reply to comment #27)
> Looking at backend/net.c, I am very suspicious about the thread-safety of
> its Avahi interactions.

Forwarded to:

https://alioth.debian.org/tracker/index.php?func=detail&aid=313921&group_id=30186&atid=410366

Hopefully we'll get some feedback from there?
Comment 29 Valentin Brasov 2013-04-07 18:51:10 UTC
Created attachment 77563 [details]
logs for shutdown hanging and the start afterwards
Comment 30 Valentin Brasov 2013-04-07 18:54:04 UTC
Comment on attachment 77563 [details]
logs for shutdown hanging and the start afterwards

Hi,

Any progress in fixing this?
I am using linux mint debian edition cinnamon 64bit 201303 with dbus 1.6.8-1 and colord 0.1.21-1 installed and almost every day, the shutdown/logout/reboot hangs my laptop!

https://bugs.launchpad.net/linuxmint/+bug/1163484

Each time when it hangs, I can see in the system log:
[ 9018.025165] colord-sane[3735]: segfault at 58 ip 00007fd6206507cc sp 00007fd61062fb78 error 6 in libdbus-1.so.3.7.2[7fd620628000+44000]

See attached syslog_for_shutdown_and_start_afterwards.log.
Comment 31 Richard Hughes 2013-04-07 19:42:17 UTC
(In reply to comment #30)
> Any progress in fixing this?

0.1.21 is really old. 0.1.32 was released recently, can you try with a newer version please.
Comment 32 Simon McVittie 2013-05-16 07:50:58 UTC
(In reply to comment #30)
> I am using linux mint debian edition cinnamon 64bit 201303 with dbus 1.6.8-1
> and colord 0.1.21-1 installed and almost every day, the
> shutdown/logout/reboot hangs my laptop!

I can't see how a crash in colord-sane would hang your laptop. Try changing "UseSANE=true" to "UseSANE=false" in /etc/colord.conf, which should stop colord-sane from being run (so it can't crash). If you still get the hangs, then you have some other unrelated bug.
Comment 33 stevesr0 2013-07-29 20:58:09 UTC
After installing simple-scan from Debian Wheezy, I see the following at boot: colord-sane [3105] segfault at 15e ip b54f9f06 sp b54e504c error 4 in libdbus-1.so.3.7.2[b54e6000+49000]

In spite of the error message, simple-scan still works, both for a black and white page and a colored image.

I am not enlightened about the analysis of this error and just reporting my experience to be a good citizen.

If there is any additional information I can provide, I will be happy to attempt to do so.

Steve Rinsler
Comment 34 Simon McVittie 2014-04-30 18:08:28 UTC
This seems to have been caused by SANE not being thread-safe. colord 0.1.31 includes a rewrite of colord-sane which is believed to not have this crash - hopefully it doesn't multi-thread its interactions with SANE, Avahi and/or libdbus.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.