Summary: | Crash on shutdown, Assertion 'a >= 0 && a < _JOB_TYPE_MAX_MERGING' failed | ||
---|---|---|---|
Product: | systemd | Reporter: | Ivan Kozik <ivan> |
Component: | general | Assignee: | systemd-bugs |
Status: | RESOLVED FIXED | QA Contact: | systemd-bugs |
Severity: | critical | ||
Priority: | highest | CC: | bugs.freedesktop.org, keepitsimpleengineer, rmlipman, steven, zoomulator |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
journals with Assertion 'a >= 0 && a < _JOB_TYPE_MAX_MERGING' failed
show types of jobs in job_type_lookup_merge failed shutdown with systemd.log_level=debug another crash and another crash gzipped core dump from systemd 217 debug build coredump systemd 317 -O0 -g core dump patch: don't try to merge a JOB_NOP systemd 317 -g core dump (not -O0) |
Description
Ivan Kozik
2013-07-28 05:19:48 UTC
Note that sometimes it does shut down successfully (maybe 30% of the time, probably more likely when the system is idle). Can you try with the following patch, so that we know what values are actually there? Created attachment 83212 [details] [review] show types of jobs in job_type_lookup_merge With the patch: Jul 29 18:13:27 nixos systemd[1]: Stopped ACPI Daemon. Jul 29 18:13:27 nixos systemd[1]: Stopped Cron Daemon. Jul 29 18:13:27 nixos systemd[1]: Stopped Getty on tty1. Jul 29 18:13:27 nixos systemd[1]: systemd-logind.service: main process exited, code=exited, status=1/FAILURE Jul 29 18:13:27 nixos systemd[1]: Stopped Login Service. Jul 29 18:13:27 nixos systemd[1]: Unit systemd-logind.service entered failed state. Jul 29 18:13:27 nixos systemd[1]: Stopped SSH Daemon. Jul 29 18:13:27 nixos systemd[1]: Stopped D-Bus System Message Bus. Jul 29 18:13:27 nixos systemd[1]: Stopped NTP Daemon. Jul 29 18:13:27 nixos systemd[1]: Stopping User and Group Name Lookups. Jul 29 18:13:27 nixos systemd[1]: Stopped target User and Group Name Lookups. Jul 29 18:13:27 nixos systemd[1]: Stopping Name Service Cache Daemon... Jul 29 18:13:27 nixos systemd[1]: Stopping system-getty.slice. Jul 29 18:13:27 nixos systemd[1]: Removed slice system-getty.slice. Jul 29 18:13:27 nixos systemd[1]: Started Store Sound Card State. Jul 29 18:13:27 nixos systemd[1]: a=5 b=2 Jul 29 18:13:27 nixos systemd[1]: Assertion 'a >= 0 && a < _JOB_TYPE_MAX_MERGING' failed at src/core/job.c:341, function job_type_lookup_merge(). Aborting. Jul 29 18:13:27 nixos systemd[1]: Caught <ABRT>, dumped core as pid 2070. Jul 29 18:13:27 nixos systemd[1]: Freezing execution. I had to reboot a few times to get this to reproduce on systemd 206; I finally got it when I had a few processes running: two `find /`, a `ping -f localhost`, and a `cat /dev/urandom | bzip2 > /dev/null` Reproduced the bug again and I see `a=5 b=2` again. (This time, I had started about a dozen `find / > /dev/null &` before rebooting.) (In reply to comment #5) > Reproduced the bug again and I see `a=5 b=2` again. So we're trying to merge JOB_NOP and JOB_STOP. Interesting. Hm, can you show the logs with systemd.log_level=debug on the kernel commandline? Created attachment 83227 [details]
failed shutdown with systemd.log_level=debug
Crashed in (apparently) the same way, not sure why the Assertion itself isn't in the journal this time
Created attachment 83229 [details]
another crash
Created attachment 83230 [details]
and another crash
It looks like an altlinux user has also run into this: http://forum.altlinux.org/index.php?topic=29850.0 (click the second [+] to see the log) FWIW, here is the stack trace of the systemd crash on NixOS: (gdb) bt #0 0x00006cb659188b0b in raise () from /nix/store/zm4bhsm8lprkzvrjgqr0klfkvr21als4-glibc-2.17/lib/libpthread.so.0 #1 0x000000000040cbe1 in crash () #2 <signal handler called> #3 0x00006cb658bf4fb5 in raise () from /nix/store/zm4bhsm8lprkzvrjgqr0klfkvr21als4-glibc-2.17/lib/libc.so.6 #4 0x00006cb658bf6436 in abort () from /nix/store/zm4bhsm8lprkzvrjgqr0klfkvr21als4-glibc-2.17/lib/libc.so.6 #5 0x0000000000460383 in log_assert () #6 0x0000000000460ad0 in log_assert_failed () #7 0x000000000047b5c6 in job_type_lookup_merge () #8 0x0000000000414672 in transaction_activate () #9 0x000000000040e684 in manager_add_job () #10 0x0000000000443f6d in bus_unit_queue_job () #11 0x000000000043fc1f in bus_manager_message_handler () #12 0x00006cb6593b3cb9 in _dbus_object_tree_dispatch_and_unlock () from /nix/store/mv8dsy1q29ph6hdk6c96f252ww89z27m-dbus-libs-1.6.14/lib/libdbus-1.so.3 #13 0x00006cb6593a5e45 in dbus_connection_dispatch () from /nix/store/mv8dsy1q29ph6hdk6c96f252ww89z27m-dbus-libs-1.6.14/lib/libdbus-1.so.3 #14 0x000000000043e82a in bus_dispatch () #15 0x0000000000411df8 in manager_loop () #16 0x000000000040a233 in main () Is this still a problem with current git? (In reply to comment #13) > Is this still a problem with current git? I haven't seen it since NixOS upgraded systemd to version 212... OK, closing then. We are seeing this problem also on systemd 215 when restarting a service. This is on ARM (imx28) But it is fairly hard to reproduce so I did not have the possibility to apply the patch or find a good reproduction path. Oct 02 23:23:44 iris python[146]: Restarting service Oct 02 23:23:44 iris systemd[1]: Assertion 'a >= 0 && a < _JOB_TYPE_MAX_MERGING' failed at src/core/job.c:314, function job_type_lookup_merge(). Aborting. Oct 02 23:23:44 iris udhcpc[151]: deleting routers Oct 02 23:23:44 iris systemd-coredump[2465]: Failed to create coredump file /var/lib/systemd/coredump/.core.systemd.0.4cd67b66fa164cb795ec91545bb3f8cf.2464.1412292224000000a2a530d4db025192: Read-only file system Oct 02 23:23:44 iris systemd-coredump[2465]: Process 2464 (systemd) of user 0 dumped core. Oct 02 23:23:44 iris systemd[1]: Caught <ABRT>, dumped core as pid 2464. Oct 02 23:23:44 iris systemd[1]: Freezing execution. Oct 02 23:23:45 iris udhcpc[151]: route: SIOCDELRT: No such process I hope to provide extra information soon Created attachment 109372 [details]
gzipped core dump from systemd 217
I'm encountering this as well on systemd 217.
Nov 12 08:01:01 dispater crond[12331]: pam_unix(crond:session): session opened for user root by (uid=0)
Nov 12 08:01:01 dispater CROND[12332]: (root) CMD (run-parts /etc/cron.hourly)
Nov 12 08:01:01 dispater CROND[12331]: pam_unix(crond:session): session closed for user root
Nov 12 08:03:37 dispater dhcpcd[1075]: br0: fe80::2472:37ff:fea0:49d2 is unreachable, expiring it
Nov 12 08:05:03 dispater dhcpcd[1075]: br0: Router Advertisement from fe80::9844:a4ff:fe69:acd5
Nov 12 08:05:03 dispater dhcpcd[1075]: br0: adding address 2002:6898:6b85:beef:6a05:caff:fe0a:6ec2/64
Nov 12 08:05:03 dispater systemd[1]: Cannot add dependency job for unit cups.socket, ignoring: Unit cups.socket failed to load: No such file or directory.
Nov 12 08:05:03 dispater systemd[1]: Stopping Name Service Cache Daemon...
Nov 12 08:05:03 dispater systemd[1]: Assertion 'a >= 0 && a < _JOB_TYPE_MAX_MERGING' failed at src/core/job.c:314, function job_type_lookup_merge(). Aborting.
Nov 12 08:05:03 dispater systemd[1]: Caught <ABRT>, dumped core as pid 12419.
Nov 12 08:05:03 dispater systemd[1]: Freezing execution.
Nov 12 08:05:03 dispater systemd-coredump[12420]: Process 12419 (systemd) of user 0 dumped core.
Nov 12 08:05:08 dispater acpid[554]: client 608[0:0] has disconnected
Nov 12 08:05:08 dispater acpid[554]: client connected from 653[82:82]
Nov 12 08:05:08 dispater acpid[554]: 1 client rule loaded
After the above point I can't log in or do anything useful with the box except reboot it.
Backtrace via 'coredumpctl gdb':
(gdb) thread apply all bt full
Thread 1 (Thread 0x7f1fa9967780 (LWP 12419)):
#0 0x00007f1fa89260c9 in raise () from /usr/lib/libpthread.so.0
No symbol table info available.
#1 0x00007f1fa9a5b3c8 in crash.lto_priv.222 (sig=6) at src/core/main.c:168
rl = {
rlim_cur = 18446744073709551615,
rlim_max = 18446744073709551615
}
sa = {
__sigaction_handler = {
sa_handler = 0x0,
sa_sigaction = 0x0
},
sa_mask = {
__val = {0 <repeats 16 times>}
},
sa_flags = 0,
sa_restorer = 0x0
}
__func__ = "crash"
__PRETTY_FUNCTION__ = "crash"
#2 <signal handler called>
No symbol table info available.
#3 0x00007f1fa85a6967 in raise () from /usr/lib/libc.so.6
No symbol table info available.
#4 0x00007f1fa85a7d3a in abort () from /usr/lib/libc.so.6
No symbol table info available.
#5 0x00007f1fa9a90012 in log_assert_failed (text=text@entry=0x7f1fa9aa78b8 "a >= 0 && a < _JOB_TYPE_MAX_MERGING", file=file@entry=0x7f1fa9a9a212 "src/core/job.c", line=line@entry=314, func=func@entry=0x7f1fa9ab8370 <__PRETTY_FUNCTION__.15000> "job_type_lookup_merge") at src/shared/log.c:718
No locals.
#6 0x00007f1fa99e943f in job_type_lookup_merge (a=<optimized out>, b=<optimized out>) at src/core/job.c:314
__PRETTY_FUNCTION__ = "job_type_lookup_merge"
#7 0x00007f1fa9a330a5 in job_type_is_superset () at src/core/job.h:198
No locals.
#8 transaction_is_destructive (e=<optimized out>, mode=<optimized out>, tr=<optimized out>) at src/core/transaction.c:516
i = 0xffffffffffffffff
j = 0x0
#9 transaction_activate (e=<optimized out>, mode=<optimized out>, m=<optimized out>, tr=<optimized out>) at src/core/transaction.c:722
i = 0xffffffffffffffff
j = 0x0
#10 manager_add_job (m=0x7f1fab518380, type=_JOB_TYPE_MAX_MERGING, unit=0x6, mode=_JOB_MODE_INVALID, override=false, e=0x7f1fa9b07180 <buffer>, _ret=0x7fff46a422b0) at src/core/manager.c:1224
__PRETTY_FUNCTION__ = "manager_add_job"
__func__ = "manager_add_job"
#11 0x00007f1fa9a1bbc5 in bus_unit_queue_job (bus=0x7f1fab5f9090, message=0x7f1fab66dd20, u=0x7f1fab5b5820, type=_JOB_TYPE_MAX_IN_TRANSACTION, mode=JOB_FAIL, reload_if_possible=128, error=0x7fff46a42440) at src/core/dbus-unit.c:777
path = 0x6 <error: Cannot access memory at address 0x6>
j = 0x1ced597e673294c1
__PRETTY_FUNCTION__ = "bus_unit_queue_job"
#12 0x00007f1fa9a1bfe6 in bus_unit_method_start_generic (bus=0x7f1fab5f9090, message=0x7f1fab66dd20, u=0x7f1fab5b5820, job_type=_JOB_TYPE_MAX_IN_TRANSACTION, reload_if_possible=reload_if_possible@entry=false, error=0x7fff46a42440) at src/core/dbus-unit.c:383
smode = 0x7f1fab6059f8 "fail"
mode = JOB_FAIL
r = 0
__PRETTY_FUNCTION__ = "bus_unit_method_start_generic"
#13 0x00007f1fa9a756c6 in method_start_unit_generic (bus=0x7f1fab5f9090, message=0x7f1fab66dd20, m=0x7f1fab518380, job_type=_JOB_TYPE_MAX_IN_TRANSACTION, reload_if_possible=<optimized out>, error=0x7fff46a42440) at src/core/dbus-manager.c:478
name = 0x7f1fab6059e4 "nscd.service"
u = 0x7f1fab5b5820
r = <optimized out>
__PRETTY_FUNCTION__ = "method_start_unit_generic"
#14 0x00007f1fa9a6b3ee in method_callbacks_run (found_object=<optimized out>, require_fallback=<optimized out>, c=<optimized out>, m=<optimized out>, bus=<optimized out>) at src/libsystemd/sd-bus/bus-objects.c:400
slot = 0x0
u = 0x7f1fab518380
error = {
name = 0x0,
message = 0x0,
_need_free = 0
}
signature = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>
#15 object_find_and_run.lto_priv.239 (bus=0x7f1fab5f9090, m=0x7f1fab66dd20, p=0x6 <error: Cannot access memory at address 0x6>, require_fallback=false, found_object=0x7fff46a425b0) at src/libsystemd/sd-bus/bus-objects.c:1224
vtable_key = {
path = 0x7f1fab605948 "/org/freedesktop/systemd1",
interface = 0x7f1fab605988 "org.freedesktop.systemd1.Manager",
member = 0x7f1fab605970 "TryRestartUnit",
parent = 0x1,
last_iteration = 0,
vtable = 0x7f1fa9a848e1 <bus_read_message.constprop.160+49>
}
__PRETTY_FUNCTION__ = "object_find_and_run"
#16 0x00007f1fa9a85ace in bus_process_object (m=<optimized out>, bus=0x7f1fab5f9090) at src/libsystemd/sd-bus/bus-objects.c:1340
prefix = 0x7fff46a424e0 "\220\220_\253\037\177"
r = <optimized out>
pl = <optimized out>
found_object = true
#17 process_message (m=<optimized out>, bus=0x7f1fab5f9090) at src/libsystemd/sd-bus/sd-bus.c:2409
r = <optimized out>
#18 process_running (priority=0, ret=0x0, hint_priority=false, bus=0x7f1fab5f9090) at src/libsystemd/sd-bus/sd-bus.c:2451
m = <optimized out>
r = <optimized out>
#19 bus_process_internal.constprop.164 (bus=0x7f1fab5f9090, ret=0x0, priority=0, hint_priority=false) at src/libsystemd/sd-bus/sd-bus.c:2640
_dont_destroy_bus = 0x7f1fab5f9090
#20 0x00007f1fa9a23b11 in sd_bus_process (ret=0x0, bus=<optimized out>) at src/libsystemd/sd-bus/sd-bus.c:2659
No locals.
#21 io_callback (s=<optimized out>, fd=<optimized out>, revents=<optimized out>, userdata=<optimized out>) at src/libsystemd/sd-bus/sd-bus.c:2918
bus = <optimized out>
__PRETTY_FUNCTION__ = "io_callback"
#22 0x00007f1fa9a4e680 in source_dispatch (s=0x7f1fab607ee0) at src/libsystemd/sd-event/sd-event.c:2103
__PRETTY_FUNCTION__ = "source_dispatch"
__func__ = "source_dispatch"
#23 0x00007f1fa9a53bde in sd_event_dispatch (e=0x7f1fab5187b0) at src/libsystemd/sd-event/sd-event.c:2460
p = <optimized out>
r = <optimized out>
e = 0x7f1fab5187b0
__PRETTY_FUNCTION__ = "sd_event_dispatch"
#24 0x00007f1fa9a66590 in sd_event_run (timeout=<optimized out>, e=<optimized out>) at src/libsystemd/sd-event/sd-event.c:2489
r = 0
#25 manager_loop (m=0x7f1fab518380) at src/core/manager.c:2010
wait_usec = 18446744073709551615
rl = {
interval = 1000000,
begin = 389870461922,
burst = 50000,
num = 23
}
__PRETTY_FUNCTION__ = "manager_loop"
__func__ = "manager_loop"
#26 0x00007f1fa99e6859 in main (argc=1, argv=0x7fff46a43098) at src/core/main.c:1731
m = 0x7f1fab518380
r = <optimized out>
retval = 1
before_startup = <optimized out>
after_startup = <optimized out>
timespan = "\000\000\000\000\000\000\000\000`.\244F\377\177\000\000X.\244F\377\177\000\000W.\244F\377\177\000\000P\213z\251\037\177\000\000h.\244F\377\177\000\000\000\000\000\000\034\000\000\000`.\244F\377\177\000"
fds = 0x0
reexecute = false
shutdown_verb = 0x0
initrd_timestamp = <optimized out>
userspace_timestamp = {
realtime = 1415418435002172,
monotonic = 3175634
}
kernel_timestamp = {
realtime = <optimized out>,
monotonic = 0
}
security_start_timestamp = {
realtime = 1415418435003792,
monotonic = 3177255
}
security_finish_timestamp = {
realtime = 1415418435004168,
monotonic = 3177631
}
systemd = "systemd"
skip_setup = <optimized out>
j = <optimized out>
loaded_policy = false
arm_reboot_watchdog = false
queue_default_job = <optimized out>
empty_etc = false
switch_root_dir = 0x0
switch_root_init = 0x0
__func__ = "main"
__PRETTY_FUNCTION__ = "main"
(gdb)
Attaching a core dump.
Reopening Created attachment 109380 [details]
debug build coredump
Here's a debug build (-O0 -g) with the same crash. I've got a fairly solid repro now, just have to reboot my router box and when it comes back this system crashes.
Better stack trace follows:
(gdb) thread apply all bt full
Thread 1 (Thread 0x7f5b1bb527c0 (LWP 13326)):
#0 0x00007f5b1ab110c9 in raise () from /usr/lib/libpthread.so.0
No symbol table info available.
#1 0x00007f5b1bbc9a53 in crash (sig=6) at src/core/main.c:168
rl = {
rlim_cur = 18446744073709551615,
rlim_max = 18446744073709551615
}
sa = {
__sigaction_handler = {
sa_handler = 0x0,
sa_sigaction = 0x0
},
sa_mask = {
__val = {0 <repeats 16 times>}
},
sa_flags = 0,
sa_restorer = 0x0
}
pid = 0
__func__ = "crash"
__PRETTY_FUNCTION__ = "crash"
#2 <signal handler called>
No symbol table info available.
#3 0x00007f5b1a791967 in raise () from /usr/lib/libc.so.6
No symbol table info available.
#4 0x00007f5b1a792d3a in abort () from /usr/lib/libc.so.6
No symbol table info available.
#5 0x00007f5b1bc499fe in log_assert_failed (text=0x7f5b1bd082b8 "a >= 0 && a < _JOB_TYPE_MAX_MERGING", file=0x7f5b1bd08023 "src/core/job.c", line=314, func=0x7f5b1bd08ae0 <__PRETTY_FUNCTION__.13893> "job_type_lookup_merge") at src/shared/log.c:718
No locals.
#6 0x00007f5b1bcaab15 in job_type_lookup_merge (a=_JOB_TYPE_MAX_MERGING, b=JOB_RESTART) at src/core/job.c:314
__PRETTY_FUNCTION__ = "job_type_lookup_merge"
#7 0x00007f5b1bbda1c3 in job_type_is_superset (a=_JOB_TYPE_MAX_MERGING, b=JOB_RESTART) at src/core/job.h:198
No locals.
#8 0x00007f5b1bbdba67 in transaction_is_destructive (tr=0x7f5b1c484e50, mode=JOB_FAIL, e=0x7fff60c9dd60) at src/core/transaction.c:516
i = 0xffffffffffffffff
j = 0x7f5b1c548a90
__PRETTY_FUNCTION__ = "transaction_is_destructive"
#9 0x00007f5b1bbdc31e in transaction_activate (tr=0x7f5b1c484e50, m=0x7f5b1c46a370, mode=JOB_FAIL, e=0x7fff60c9dd60) at src/core/transaction.c:722
i = 0xffffffffffffffff
j = 0x0
r = 0
generation = 3
__PRETTY_FUNCTION__ = "transaction_activate"
__func__ = "transaction_activate"
#10 0x00007f5b1bbd3587 in manager_add_job (m=0x7f5b1c46a370, type=_JOB_TYPE_MAX_MERGING, unit=0x7f5b1c51cf40, mode=JOB_FAIL, override=true, e=0x7fff60c9dd60, _ret=0x7fff60c9dbd0) at src/core/manager.c:1224
r = 0
tr = 0x7f5b1c484e50
__PRETTY_FUNCTION__ = "manager_add_job"
__func__ = "manager_add_job"
#11 0x00007f5b1bc19ca9 in bus_unit_queue_job (bus=0x7f5b1c47cfc0, message=0x7f5b1c549010, u=0x7f5b1c51cf40, type=_JOB_TYPE_MAX_IN_TRANSACTION, mode=JOB_FAIL, reload_if_possible=false, error=0x7fff60c9dd60) at src/core/dbus-unit.c:777
path = 0x0
j = 0x7f5b1c46a790
r = 0
__PRETTY_FUNCTION__ = "bus_unit_queue_job"
#12 0x00007f5b1bc18b54 in bus_unit_method_start_generic (bus=0x7f5b1c47cfc0, message=0x7f5b1c549010, u=0x7f5b1c51cf40, job_type=_JOB_TYPE_MAX_IN_TRANSACTION, reload_if_possible=false, error=0x7fff60c9dd60) at src/core/dbus-unit.c:383
smode = 0x7f5b1c484f58 "fail"
mode = JOB_FAIL
r = 1
__PRETTY_FUNCTION__ = "bus_unit_method_start_generic"
#13 0x00007f5b1bc12745 in method_start_unit_generic (bus=0x7f5b1c47cfc0, message=0x7f5b1c549010, m=0x7f5b1c46a370, job_type=_JOB_TYPE_MAX_IN_TRANSACTION, reload_if_possible=false, error=0x7fff60c9dd60) at src/core/dbus-manager.c:478
name = 0x7f5b1c484f44 "nscd.service"
u = 0x7f5b1c51cf40
r = 1
__PRETTY_FUNCTION__ = "method_start_unit_generic"
#14 0x00007f5b1bc12899 in method_try_restart_unit (bus=0x7f5b1c47cfc0, message=0x7f5b1c549010, userdata=0x7f5b1c46a370, error=0x7fff60c9dd60) at src/core/dbus-manager.c:498
No locals.
#15 0x00007f5b1bc81167 in method_callbacks_run (bus=0x7f5b1c47cfc0, m=0x7f5b1c549010, c=0x7f5b1c4854b0, require_fallback=false, found_object=0x7fff60c9de83) at src/libsystemd/sd-bus/bus-objects.c:400
slot = 0x7f5b1c53c4b0
error = {
name = 0x0,
message = 0x0,
_need_free = 0
}
signature = 0x7f5b1c476690 "ss"
u = 0x7f5b1c46a370
r = 1
__PRETTY_FUNCTION__ = "method_callbacks_run"
#16 0x00007f5b1bc83901 in object_find_and_run (bus=0x7f5b1c47cfc0, m=0x7f5b1c549010, p=0x7f5b1c484ea8 "/org/freedesktop/systemd1", require_fallback=false, found_object=0x7fff60c9de83) at src/libsystemd/sd-bus/bus-objects.c:1224
n = 0x7f5b1c548e10
vtable_key = {
path = 0x7f5b1c484ea8 "/org/freedesktop/systemd1",
interface = 0x7f5b1c484ee8 "org.freedesktop.systemd1.Manager",
member = 0x7f5b1c484ed0 "TryRestartUnit",
parent = 0x8c777b69f5dd8c00,
last_iteration = 0,
vtable = 0x7f5b1bbc95d3 <_start>
}
v = 0x7f5b1c4854b0
r = 0
__PRETTY_FUNCTION__ = "object_find_and_run"
#17 0x00007f5b1bc83f33 in bus_process_object (bus=0x7f5b1c47cfc0, m=0x7f5b1c549010) at src/libsystemd/sd-bus/bus-objects.c:1340
prefix = <error reading variable prefix (frame address is not available.)>
r = 32603
pl = 25
found_object = true
__PRETTY_FUNCTION__ = "bus_process_object"
#18 0x00007f5b1bc91239 in process_message (bus=0x7f5b1c47cfc0, m=0x7f5b1c549010) at src/libsystemd/sd-bus/sd-bus.c:2409
r = 0
__PRETTY_FUNCTION__ = "process_message"
__func__ = "process_message"
#19 0x00007f5b1bc9140e in process_running (bus=0x7f5b1c47cfc0, hint_priority=false, priority=0, ret=0x0) at src/libsystemd/sd-bus/sd-bus.c:2451
m = 0x7f5b1c549010
r = 1
__PRETTY_FUNCTION__ = "process_running"
__func__ = "process_running"
#20 0x00007f5b1bc91c50 in bus_process_internal (bus=0x7f5b1c47cfc0, hint_priority=false, priority=0, ret=0x0) at src/libsystemd/sd-bus/sd-bus.c:2640
_dont_destroy_bus = 0x7f5b1c47cfc0
r = 32603
__PRETTY_FUNCTION__ = "bus_process_internal"
#21 0x00007f5b1bc91d21 in sd_bus_process (bus=0x7f5b1c47cfc0, ret=0x0) at src/libsystemd/sd-bus/sd-bus.c:2659
No locals.
#22 0x00007f5b1bc924bd in io_callback (s=0x7f5b1c46b000, fd=19, revents=1, userdata=0x7f5b1c47cfc0) at src/libsystemd/sd-bus/sd-bus.c:2918
bus = 0x7f5b1c47cfc0
r = 32767
__PRETTY_FUNCTION__ = "io_callback"
#23 0x00007f5b1bc987c4 in source_dispatch (s=0x7f5b1c46b000) at src/libsystemd/sd-event/sd-event.c:2103
r = 0
__PRETTY_FUNCTION__ = "source_dispatch"
__func__ = "source_dispatch"
#24 0x00007f5b1bc99b44 in sd_event_dispatch (e=0x7f5b1c46ab10) at src/libsystemd/sd-event/sd-event.c:2460
p = 0x7f5b1c46b000
r = 32767
__PRETTY_FUNCTION__ = "sd_event_dispatch"
#25 0x00007f5b1bc99cd5 in sd_event_run (e=0x7f5b1c46ab10, timeout=18446744073709551615) at src/libsystemd/sd-event/sd-event.c:2489
r = 1
__PRETTY_FUNCTION__ = "sd_event_run"
#26 0x00007f5b1bbd5ee9 in manager_loop (m=0x7f5b1c46a370) at src/core/manager.c:2010
wait_usec = 18446744073709551615
r = 1
rl = {
interval = 1000000,
begin = 37569410558,
burst = 50000,
num = 26
}
__PRETTY_FUNCTION__ = "manager_loop"
__func__ = "manager_loop"
#27 0x00007f5b1bbceb93 in main (argc=4, argv=0x7fff60c9e468) at src/core/main.c:1731
m = 0x7f5b1c46a370
r = 0
retval = 1
before_startup = 37382632984
after_startup = 0
timespan = "\004\000\000\000\000\000\000\000\240\343\311`\377\177\000\000\330t\265\033[\177\000\000\000\000\000\000\000\000\000\000\002\000\000\000\300\237\000\000\r\206\315\033[\177\000\000\302", '\000' <repeats 14 times>
fds = 0x0
reexecute = false
shutdown_verb = 0x0
initrd_timestamp = {
realtime = 0,
monotonic = 0
}
userspace_timestamp = {
realtime = 1415856026408304,
monotonic = 37382632434
}
kernel_timestamp = {
realtime = 1415818643775870,
monotonic = 0
}
security_start_timestamp = {
realtime = 0,
monotonic = 0
}
security_finish_timestamp = {
realtime = 0,
monotonic = 0
}
systemd = "systemd"
skip_setup = true
j = 32767
loaded_policy = false
arm_reboot_watchdog = false
queue_default_job = false
empty_etc = false
switch_root_dir = 0x0
switch_root_init = 0x0
saved_rlimit_nofile = {
rlim_cur = 1024,
rlim_max = 4096
}
__func__ = "main"
__PRETTY_FUNCTION__ = "main"
Created attachment 109381 [details]
systemd 317 -O0 -g core dump
Created attachment 109383 [details] [review] patch: don't try to merge a JOB_NOP From my read of the code, here's what happens. The stack traces are slightly obfuscated by the fact we have duplicate values in the JobType enum, so mentally translate these when reading through it: _JOB_TYPE_MAX_IN_TRANSACTION is JOB_TRY_RESTART _JOB_TYPE_MAX_MERGING is JOB_NOP The transaction arrives as a JOB_TRY_RESTART, but between the entry of manager_add_job and when it calls transaction_activate, the type changes to JOB_NOP. This appears to happen because of job_type_collapse, which notices the unit is already deactivating or inactive and modifies the passed JobType in place to be a JOB_NOP. Deeper down the stack, transaction_is_destructive tries to use job_type_is_superset on the JOB_NOP, which will fail since it's not a mergeable job type. Skip the job_type_is_superset on a JOB_NOP and we should be good. Possible patch attached. Not sure if it's the right approach, though. Still reading through the code to try to understand the transaction/merging semantics. :) Created attachment 110146 [details]
systemd 317 -g core dump (not -O0)
I get a similar crash, but instead of on shutdown, it happens either at random or on boot.
Trace for my situation follows:
(gdb) thread apply all bt full
Thread 1 (Thread 0x7f0a24a337c0 (LWP 593)):
#0 0x00007f0a23c070c9 in raise () from /usr/lib/libpthread.so.0
No symbol table info available.
#1 0x00007f0a24b1eeb8 in crash.lto_priv.222 (sig=6) at src/core/main.c:168
rl = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615}
sa = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0},
sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0,
sa_restorer = 0x0}
__func__ = "crash"
__PRETTY_FUNCTION__ = "crash"
#2 <signal handler called>
No symbol table info available.
#3 0x00007f0a23887967 in raise () from /usr/lib/libc.so.6
No symbol table info available.
#4 0x00007f0a23888d3a in abort () from /usr/lib/libc.so.6
No symbol table info available.
#5 0x00007f0a24b4dee2 in log_assert_failed (
text=text@entry=0x7f0a24b65760 "a >= 0 && a < _JOB_TYPE_MAX_MERGING",
file=file@entry=0x7f0a24b580c0 "src/core/job.c", line=line@entry=314,
func=func@entry=0x7f0a24b76170 <__PRETTY_FUNCTION__.14671> "job_type_lookup_merge") at src/shared/log.c:718
No locals.
#6 0x00007f0a24aa756f in job_type_lookup_merge (a=<optimized out>,
b=<optimized out>) at src/core/job.c:314
__PRETTY_FUNCTION__ = "job_type_lookup_merge"
#7 0x00007f0a24b21825 in job_type_is_superset () at src/core/job.h:198
No locals.
#8 transaction_is_destructive (e=<optimized out>, mode=<optimized out>,
tr=<optimized out>) at src/core/transaction.c:516
i = 0xffffffffffffffff
j = 0x0
#9 transaction_activate (e=<optimized out>, mode=<optimized out>,
m=<optimized out>, tr=<optimized out>) at src/core/transaction.c:722
i = 0xffffffffffffffff
j = 0x0
#10 manager_add_job (m=0x7f0a26420380, type=_JOB_TYPE_MAX_MERGING, unit=0x6,
mode=_JOB_MODE_INVALID, override=false, e=0x1, _ret=0x7fff32c31fb0)
at src/core/manager.c:1224
__PRETTY_FUNCTION__ = "manager_add_job"
__func__ = "manager_add_job"
#11 0x00007f0a24acd015 in bus_unit_queue_job (bus=0x7f0a26434b90,
message=0x7f0a26504ae0, u=0x7f0a264cb770,
type=_JOB_TYPE_MAX_IN_TRANSACTION, mode=JOB_FAIL, reload_if_possible=true,
error=0x7fff32c32140) at src/core/dbus-unit.c:777
path = 0x6 <error: Cannot access memory at address 0x6>
j = 0x4b33a1ae32564764
__PRETTY_FUNCTION__ = "bus_unit_queue_job"
#12 0x00007f0a24acd436 in bus_unit_method_start_generic (bus=0x7f0a26434b90,
message=0x7f0a26504ae0, u=0x7f0a264cb770,
job_type=_JOB_TYPE_MAX_IN_TRANSACTION,
reload_if_possible=reload_if_possible@entry=false, error=0x7fff32c32140)
at src/core/dbus-unit.c:383
smode = 0x7f0a265011d8 "fail"
mode = JOB_FAIL
r = 0
__PRETTY_FUNCTION__ = "bus_unit_method_start_generic"
#13 0x00007f0a24b3df56 in method_start_unit_generic (bus=0x7f0a26434b90,
message=0x7f0a26504ae0, m=0x7f0a26420380,
job_type=_JOB_TYPE_MAX_IN_TRANSACTION, reload_if_possible=<optimized out>,
error=0x7fff32c32140) at src/core/dbus-manager.c:478
name = 0x7f0a265011c4 "dnsmasq.service"
u = 0x7f0a264cb770
r = <optimized out>
__PRETTY_FUNCTION__ = "method_start_unit_generic"
#14 0x00007f0a24b2df8e in method_callbacks_run (found_object=<optimized out>,
require_fallback=<optimized out>, c=<optimized out>, m=<optimized out>,
bus=<optimized out>) at src/libsystemd/sd-bus/bus-objects.c:400
slot = 0x0
u = 0x7f0a26420380
error = {name = 0x0, message = 0x0, _need_free = 0}
signature = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>
#15 object_find_and_run.lto_priv.239 (bus=0x7f0a26434b90, m=0x7f0a26504ae0,
p=0x6 <error: Cannot access memory at address 0x6>,
require_fallback=false, found_object=0x7fff32c322b0)
at src/libsystemd/sd-bus/bus-objects.c:1224
vtable_key = {path = 0x7f0a26501128 "/org/freedesktop/systemd1",
interface = 0x7f0a26501168 "org.freedesktop.systemd1.Manager",
member = 0x7f0a26501150 "TryRestartUnit", parent = 0x1,
last_iteration = 0,
vtable = 0x7f0a24b42681 <bus_read_message.constprop.160+49>}
__PRETTY_FUNCTION__ = "object_find_and_run"
#16 0x00007f0a24b4386b in bus_process_object (m=<optimized out>,
bus=0x7f0a26434b90) at src/libsystemd/sd-bus/bus-objects.c:1340
prefix = 0x7fff32c321e0 "\220KC&\n\177"
r = <optimized out>
pl = <optimized out>
found_object = true
#17 process_message (m=<optimized out>, bus=0x7f0a26434b90)
at src/libsystemd/sd-bus/sd-bus.c:2409
r = <optimized out>
#18 process_running (priority=0, ret=0x0, hint_priority=false,
bus=0x7f0a26434b90) at src/libsystemd/sd-bus/sd-bus.c:2451
m = <optimized out>
r = <optimized out>
#19 bus_process_internal.constprop.164 (bus=0x7f0a26434b90, ret=0x0,
priority=0, hint_priority=false) at src/libsystemd/sd-bus/sd-bus.c:2640
_dont_destroy_bus = 0x7f0a26434b90
#20 0x00007f0a24ae1871 in sd_bus_process (ret=0x0, bus=<optimized out>)
at src/libsystemd/sd-bus/sd-bus.c:2659
No locals.
#21 io_callback (s=<optimized out>, fd=<optimized out>,
revents=<optimized out>, userdata=<optimized out>)
at src/libsystemd/sd-bus/sd-bus.c:2918
bus = <optimized out>
__PRETTY_FUNCTION__ = "io_callback"
#22 0x00007f0a24b0bdf0 in source_dispatch (s=0x7f0a264e1a00)
at src/libsystemd/sd-event/sd-event.c:2103
__PRETTY_FUNCTION__ = "source_dispatch"
__func__ = "source_dispatch"
#23 0x00007f0a24b113fe in sd_event_dispatch (e=0x7f0a264207b0)
at src/libsystemd/sd-event/sd-event.c:2460
p = <optimized out>
r = <optimized out>
e = 0x7f0a264207b0
__PRETTY_FUNCTION__ = "sd_event_dispatch"
#24 0x00007f0a24af8d20 in sd_event_run (timeout=<optimized out>,
e=<optimized out>) at src/libsystemd/sd-event/sd-event.c:2489
r = 0
#25 manager_loop (m=0x7f0a26420380) at src/core/manager.c:2010
wait_usec = 18446744073709551615
rl = {interval = 1000000, begin = 16571505, burst = 50000, num = 135}
__PRETTY_FUNCTION__ = "manager_loop"
__func__ = "manager_loop"
#26 0x00007f0a24aa4845 in main (argc=1, argv=<optimized out>)
at src/core/main.c:1731
m = 0x7f0a26420380
r = <optimized out>
retval = 1
before_startup = <optimized out>
after_startup = <optimized out>
timespan = "\310\001\000\000\000\000\000\000\310\001\000\000\000\000\000\000\310\001\000\000\000\000\000\000$\000\000\000\000\000\000\000$\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000P\345td\004\000\000\000\210+\001\000\000\000\000"
fds = 0x0
reexecute = false
shutdown_verb = 0x0
initrd_timestamp = {realtime = 0, monotonic = <optimized out>}
userspace_timestamp = {realtime = 1417111964157395,
monotonic = 1925963}
kernel_timestamp = {realtime = <optimized out>, monotonic = 0}
security_start_timestamp = {realtime = 1417111964159510,
monotonic = 1928079}
security_finish_timestamp = {realtime = 1417111964160112,
monotonic = 1928680}
systemd = "systemd"
skip_setup = <optimized out>
j = <optimized out>
loaded_policy = false
arm_reboot_watchdog = false
queue_default_job = <optimized out>
empty_etc = false
switch_root_dir = 0x0
switch_root_init = 0x0
__func__ = "main"
__PRETTY_FUNCTION__ = "main"
Yes, I should have mentioned that it happens more or less randomly. In my case it wasn't at shutdown. Any chance you could check if current git fixes your issue? transaction_is_destructive() has been change quite a bit, to fix a few bugs. Pretty sure this is fixed now. Closing bug. Feel free to reopen if problem persists. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.