On a device equipped with several modems, we could see frequent crashes of the ModemManager process. Frequent = more than every 30 minutes. Investigating on the hardware side (could well be the modems resetting because they're hot), but investigating why the ModemManager crashed, either. I've got kernel messages about the signal, but no backtrace yet. kernel: [ 6878.385210] ModemManager[13410]: segfault at 20 ip 00007f247f3eff15 sp 00007fff429de460 error 6 in libqmi-glib.so.5.1.0[7f247f397000+1b2000] Not enough elements yet to knwow whether the bug actually lies in ModemManager or in libqmi.
Which libqmi/ModemManager versions? I'd love to see a backtrace.
So do I, therefore I'm prepring it. However, it could just be an incoherence between libqmi-proxy and libqmi... I'm checking this.
(In reply to Benoît Donnette from comment #2) > So do I, therefore I'm prepring it. However, it could just be an incoherence > between libqmi-proxy and libqmi... > > I'm checking this. Does qmi-proxy also segfault? I've got some reports of qmi-proxy segfaults but couldn't reproduce them or get backtraces either
Yes, I've got either ModemManager or qmi-proxy crashes (possibly sometimes both). Both may crash with a SIGSEGV. I suspect this is due to using the Debian testing ModemManager and its dependencies, but the Debian 8.5 mainstream qmi-proxy. They ldd each to a different libqmi.
"Testing" ModemManager = 1.6.0 for us. I am pushing the qmi-proxy from testing as well. I'll have a huge field testing after that if it does no harm, so I'll let you know if the bug disappears. It's on a device with 2 to 4 Cat6 modems, using qmi.
Any update?
Well, yes. I used to have 2 different crashes : one was qmi-proxy trying to read at address 0, the other being ModemManager. I used to beleive they were connected : they're not. The qmi-proxy crash comes from a qmi-proxy cohabitating with a ModemManager/libqmi on heterogeneous versions of the libqmi (Debian packaging used not to prevent from this : you can have libqmi-proxy version 1.14.0 and libqmi-glib5 version 1.16.0). Upgrading libqmi-proxy to match libqmi-glib fixes. I'm now setting up core dumps on the devices for the second bug : a ModemManager crash while writing at address 0x20. And upgrading libqmi-proxy doesn't fix. I'll keep you informed, I just managed to have a coredump produced bu a systemd daemon, now deploying to get a test. A genuine coredump should arrive within 2 days.
(In reply to Benoît Donnette from comment #7) > Well, yes. > > I used to have 2 different crashes : one was qmi-proxy trying to read at > address 0, the other being ModemManager. I have my own qmi-proxy backtraces now for a quite easy to reproduce crash; I should be providing an update in the next few days.
ModemManager crash : It's actually in libqmi. In qmi-device.c. We're dereferencing a value after getting the transaction to be cancelled. No test if pointer gets NULL. I have a stacktrace (upload tomorrow).
(In reply to Benoît Donnette from comment #9) > ModemManager crash : > > It's actually in libqmi. In qmi-device.c. > > We're dereferencing a value after getting the transaction to be cancelled. > No test if pointer gets NULL. > > I have a stacktrace (upload tomorrow). See https://cgit.freedesktop.org/libqmi/commit/?id=0148e81aa978a5d94ef2e9ddf8adfefa7ce2ef3f I believe the commit may fix the issue you're seeing (transaction pointer is NULL on the cancellation handler because it isn't found in the internal tracking table). Could you validate this by using libqmi git master in your setup? I also added some other segfault fixes and several memleak fixes as well.
Hi, Building a package and teing today (this afternoon is quite possible). I'll have a feedback, maybe on Monday (I'm supposed to be off but I can access the devices), and worst case on Wednesday.
(In reply to Benoît Donnette from comment #11) > Hi, > > Building a package and teing today (this afternoon is quite possible). I'll > have a feedback, maybe on Monday (I'm supposed to be off but I can access > the devices), and worst case on Wednesday. Well, I still have crashes today after the patch, getting a backtrace of it.
Looks that my issue is in the transaction_cancelled() function, as the traces suggest the failing access is a dereference of an offset 0x20 after NULL, with a write access. This means it's a problem on the cancellable_id field. The stack trace showed a SIGSEGV in the transaction_cancelled function, that' the element I didn't put here. I'm adding a NULL test around the line : tr->cancellable_id = 0;
Any luck with the backtrace?
Well, sure ! As s
(In reply to Benoît Donnette from comment #15) > Well, sure ! > > As s Well sure, I've anayzed it, but not been able to field test modification yet. As said in my previous message, the crash takes place in : transaction_cancelled() I basically added a test against NULL in the result from the 2 remaining calls to device_release_transaction() Deployment on a sample of the fiend tomorrow.
Created attachment 128043 [details] Protects 2 more dereferences against NULL Tested successfully today. A usecase that triggered at least 3 crashes before has not shown a single crash.
(In reply to Benoît Donnette from comment #17) > Created attachment 128043 [details] > Protects 2 more dereferences against NULL > > Tested successfully today. A usecase that triggered at least 3 crashes > before has not shown a single crash. This patch solves the segfault, but I believe the proper fix is to make sure the timeout and cancellations get cleaned up whenever the transaction gets removed from the HT; if we don't make sure of that we may get bitten by the issue somewhere else.
Which device(s) where you using while testing this? Do you have any debug log that could indicate why the transaction was being cancelled?
Nevermind; your analysis was mostly right, we do get a NULL tr in the cancellation callback, but it isn't an error. This happens when the cancellable given to a QMI command is already cancelled when we create the transaction: https://cgit.freedesktop.org/libqmi/commit/?id=e57f062e666b9a0686e38e3722664b4b2ac9658c Fixed in libqmi git master, qmi-1-16, qmi-1-14 and qmi-1-12. Also fixed the equivalent issue in libmbim git master, mbim-1-12 and mbim-1-10. Thanks!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.