Bug 84994

Summary: mbim: handle NotOpened errors after suspend or hibernate
Product: ModemManager Reporter: Aleksander Morgado <aleksander>
Component: generalAssignee: ModemManager bug user <modemmanager>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: tore
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Patch 1/3
Patch 2/3
Patch 3/3
Debug logs
Debug logs
MM debug log
mbim-proxy verbose output
MM debug log
mbim-proxy verbose output
MM debug log - WWAN-only testing
mbim-proxy verbose output - WWAN-only testing - pre-segfault
mbim-proxy verbose output - WWAN-only testing - post-segfault
ModemManager debug output cf. comment #35
mbim-proxy debug output cf. comment #35
mmcli -m $idx output cf. comment #35

Description Aleksander Morgado 2014-10-14 12:31:23 UTC
Originally reported at:
  https://bugzilla.gnome.org/show_bug.cgi?id=733086
Please refer to the original bug report if more details are needed.

When suspending, ModemManager keeps the state of the modem (MBIM open), but the modem itself gets fully reseted and needs an explicit new MBIM open, or otherwise the MBIM port will keep on saying that it was not opened:

  [/dev/cdc-wdm1] Received message (translated)...
  >>>>>> Header:
  >>>>>>   length      = 16
  >>>>>>   type        = function-error (0x80000004)
  >>>>>>   transaction = 12
  >>>>>> Contents:
  >>>>>>   error = 'NotOpened' (0x00000005)
Comment 1 Aleksander Morgado 2014-10-14 14:01:02 UTC
An additional report which is showing the same issue, but hibernating instead of suspending:

https://bugzilla.gnome.org/show_bug.cgi?id=725377
http://lists.freedesktop.org/archives/modemmanager-devel/2014-February/000893.html
Comment 2 Aleksander Morgado 2014-12-26 13:20:54 UTC
Created attachment 111370 [details] [review]
Patch 1/3
Comment 3 Aleksander Morgado 2014-12-26 13:21:16 UTC
Created attachment 111371 [details] [review]
Patch 2/3
Comment 4 Aleksander Morgado 2014-12-26 13:21:36 UTC
Created attachment 111372 [details] [review]
Patch 3/3
Comment 5 Aleksander Morgado 2014-12-26 13:22:40 UTC
Tore, can you apply those 3 patches and see if you get the issue fixed? The outcome should be that once MM detects that the port isn't opened, it will mark the modem as invalid and it should then re-probe it.
Comment 6 Aleksander Morgado 2014-12-26 13:28:34 UTC
(In reply to Aleksander Morgado from comment #5)
> Tore, can you apply those 3 patches and see if you get the issue fixed? The
> outcome should be that once MM detects that the port isn't opened, it will
> mark the modem as invalid and it should then re-probe it.

Wait... not yet, I'll need to review it because when marked as invalid, the modem ports are not re-probed...
Comment 7 Aleksander Morgado 2014-12-26 15:32:15 UTC
(In reply to Aleksander Morgado from comment #6)
> (In reply to Aleksander Morgado from comment #5)
> > Tore, can you apply those 3 patches and see if you get the issue fixed? The
> > outcome should be that once MM detects that the port isn't opened, it will
> > mark the modem as invalid and it should then re-probe it.
> 
> Wait... not yet, I'll need to review it because when marked as invalid, the
> modem ports are not re-probed...

Tore, I pushed a new "aleksander/mbim-not-open" branch to upstream git; could you check it and tell me if that solves the issue? Debug logs would also help.
Comment 8 Tore Anderson 2014-12-26 22:58:32 UTC
Created attachment 111380 [details]
Debug logs

Hi, I tested the branch (df06d3a7), it doesn't fix the problem. Attaching an debug log. Timestamps of various activities so you can correlate with the log:

23:41:23 ModemManager started
23:42:33 Test SMS successfully sent (using mmcli)
23:43:16 System suspended
23:43:41 System resumed
23:44:13 Test SMS failed to send, error message: «couldn't send the SMS: 'GDBus.Error:org.freedesktop.libmbim.Error.Core.Timeout: Couldn't send SMS part: Transaction timed out'»
23:45:30 Attempted to connect to mobile broadband using NetworkManager, fails with «Couldn't connect bearer: 'Transaction timed out'»

After restarting ModemManager I can successfully send SMS and connect to mobile broadband again.
Comment 9 Aleksander Morgado 2014-12-27 10:08:50 UTC
(In reply to Tore Anderson from comment #8)
> Created attachment 111380 [details]
> Debug logs
> 
> Hi, I tested the branch (df06d3a7), it doesn't fix the problem. Attaching an
> debug log. Timestamps of various activities so you can correlate with the
> log:
> 
> 23:41:23 ModemManager started
> 23:42:33 Test SMS successfully sent (using mmcli)
> 23:43:16 System suspended
> 23:43:41 System resumed
> 23:44:13 Test SMS failed to send, error message: «couldn't send the SMS:
> 'GDBus.Error:org.freedesktop.libmbim.Error.Core.Timeout: Couldn't send SMS
> part: Transaction timed out'»
> 23:45:30 Attempted to connect to mobile broadband using NetworkManager,
> fails with «Couldn't connect bearer: 'Transaction timed out'»
> 
> After restarting ModemManager I can successfully send SMS and connect to
> mobile broadband again.

Gah... this is very unfortunate. We no longer get NotOpened errors; instead the modem doesn't even reply at all after suspend, and I guess that a new MBIM re-open solves it. One thing that can be done is to mark the modem as invalid when we get N consecutive timeouts in commands; like what we do with RS232 modems. Another possibility would be to explicitly listen for suspend/hibernate events via DBus, and detect when a modem doesn't respond after such events.

It probably is worth to go on and start implementing logic for the latter case (monitoring suspend/hibernate), as I believe we could make use of that info on more situations...

Anyway, checking consecutive timeouts is a quick hack we can try as well. Timeouts in MBIM are very rare, I would say, so just a threshold of 2 could be enough as a temporary solution until we monitor PM events.
Comment 10 Aleksander Morgado 2014-12-27 11:12:01 UTC
(In reply to Aleksander Morgado from comment #9)
> (In reply to Tore Anderson from comment #8)
> > Created attachment 111380 [details]
> > Debug logs
> > 
> > Hi, I tested the branch (df06d3a7), it doesn't fix the problem. Attaching an
> > debug log. Timestamps of various activities so you can correlate with the
> > log:
> > 
> > 23:41:23 ModemManager started
> > 23:42:33 Test SMS successfully sent (using mmcli)
> > 23:43:16 System suspended
> > 23:43:41 System resumed
> > 23:44:13 Test SMS failed to send, error message: «couldn't send the SMS:
> > 'GDBus.Error:org.freedesktop.libmbim.Error.Core.Timeout: Couldn't send SMS
> > part: Transaction timed out'»
> > 23:45:30 Attempted to connect to mobile broadband using NetworkManager,
> > fails with «Couldn't connect bearer: 'Transaction timed out'»
> > 
> > After restarting ModemManager I can successfully send SMS and connect to
> > mobile broadband again.
> 
> Gah... this is very unfortunate. We no longer get NotOpened errors; instead
> the modem doesn't even reply at all after suspend, and I guess that a new
> MBIM re-open solves it. One thing that can be done is to mark the modem as
> invalid when we get N consecutive timeouts in commands; like what we do with
> RS232 modems. Another possibility would be to explicitly listen for
> suspend/hibernate events via DBus, and detect when a modem doesn't respond
> after such events.
> 
> It probably is worth to go on and start implementing logic for the latter
> case (monitoring suspend/hibernate), as I believe we could make use of that
> info on more situations...
> 
> Anyway, checking consecutive timeouts is a quick hack we can try as well.
> Timeouts in MBIM are very rare, I would say, so just a threshold of 2 could
> be enough as a temporary solution until we monitor PM events.

Pushed a new branch, named "aleksander/mbim-not-open-2". Could you test that one? If we detect not-opened errors OR 2 or more consecutive MBIM command timeouts, we'll fully re-probe the modem.
Comment 11 Tore Anderson 2014-12-27 12:20:06 UTC
Created attachment 111396 [details]
Debug logs

Doesn't quite do the trick either:

12:59:56 Booted system
13:05:12 SMS test #1 OK
13:06:52 Suspending system
13:07:08 System resumed
13:08:15 SMS test #2 fails: error: couldn't send the SMS: 'GDBus.Error:org.freedesktop.libmbim.Error.Core.Timeout: Couldn't send SMS part: Transaction timed out'
13:09:41 Trying to connect mobile broadband: fails
13:10:11 MBIM timeout occurs and MM tries to re-init the modem, but this ends up failing too:
<warn>  Modem couldn't be initialized: couldn't load current capabilities: Transaction timed out
Comment 12 Aleksander Morgado 2014-12-28 08:57:46 UTC
(In reply to Tore Anderson from comment #11)
> Created attachment 111396 [details]
> Debug logs
> 
> Doesn't quite do the trick either:
> 
> 12:59:56 Booted system
> 13:05:12 SMS test #1 OK
> 13:06:52 Suspending system
> 13:07:08 System resumed
> 13:08:15 SMS test #2 fails: error: couldn't send the SMS:
> 'GDBus.Error:org.freedesktop.libmbim.Error.Core.Timeout: Couldn't send SMS
> part: Transaction timed out'
> 13:09:41 Trying to connect mobile broadband: fails
> 13:10:11 MBIM timeout occurs and MM tries to re-init the modem, but this
> ends up failing too:
> <warn>  Modem couldn't be initialized: couldn't load current capabilities:
> Transaction timed out

We're getting closer :) Now it's the proxy the one in the middle. I wonder if it is the proxy itself the one getting the NotOpened errors and whether if it is the one causing the timeout errors in ModemManager.

Could you retry the same branch, but this time running the proxy itself in another terminal? E.g.:

term1 $> sudo /usr/libexec/mbim-proxy --verbose

term2 $> sudo /usr/sbin/ModemManager --debug
Comment 13 Tore Anderson 2014-12-28 11:02:35 UTC
Created attachment 111422 [details]
MM debug log

Certainly!

11:52:26 Started MM
11:53:48 SMS #1 - OK
11:54:33 Suspended system
11:54:54 Resumed system
11:55:49 SMS #2 - fails (error: couldn't send the SMS: 'GDBus.Error:org.freedesktop.libmbim.Error.Core.Timeout: Couldn't send SMS part: Transaction timed out')
11:56:45 WWAN connection attempt - fails: (<warn>  MBIM transactions timed out, reporting invalid modem...)

Afterwards, it tries to re-init the modem, but all the info is wrong:

/org/freedesktop/ModemManager1/Modem/1 (device id 'unknown')
  -------------------------
  Hardware |   manufacturer: 'unknown'
           |          model: 'unknown'
           |       revision: 'unknown'
           |      supported: 'none'
           |        current: 'none'
           |   equipment id: 'unknown'
  -------------------------
  System   |         device: '/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.5'
           |        drivers: 'cdc_acm, cdc_mbim, cdc_wdm'
           |         plugin: 'Ericsson MBM'
           |   primary port: 'cdc-wdm2'
           |          ports: 'ttyACM1 (at), wwp0s26u1u5i6 (net), ttyACM0 (at), cdc-wdm2 (mbim), ttyACM2 (at)'
  -------------------------
  Numbers  |           own : 'unknown'
  -------------------------
  Status   |           lock: 'unknown'
           | unlock retries: 'unknown'
           |          state: 'unknown'
           |    power state: 'unknown'
           |    access tech: 'unknown'
           | signal quality: '0' (cached)
  -------------------------
  Modes    |      supported: 'allowed: any; preferred: none'
           |        current: 'allowed: any; preferred: none'
  -------------------------
  Bands    |      supported: 'unknown'
           |        current: 'unknown'
  -------------------------
  IP       |      supported: 'none'
  -------------------------
  SIM      |           path: 'none'

  -------------------------
  Bearers  |          paths: 'none'

11:59:08 Stopped MM

Tore
Comment 14 Tore Anderson 2014-12-28 11:02:54 UTC
Created attachment 111423 [details]
mbim-proxy verbose output
Comment 15 Aleksander Morgado 2014-12-28 11:37:39 UTC
(In reply to Tore Anderson from comment #14)
> Created attachment 111423 [details]
> mbim-proxy verbose output

Ah, there they are the NotOpened errors.

The thing now is that MM tries to re-open the device, but the proxy still has the same /dev/cdc-wdm port flagged as being open, while it really isn't after the suspend/resume. We now need a libmbim fix to handle this case.
Comment 16 Aleksander Morgado 2014-12-28 17:55:25 UTC
Ok, next try.

I've pushed two branches, one for libmbim named "aleksander/function-error-in-transaction" and another one in ModemManager named "aleksander/mbim-not-open-3". Could you retry with these and get me debug logs as before both in the proxy and in MM?
Comment 17 Tore Anderson 2014-12-29 13:05:37 UTC
(In reply to Aleksander Morgado from comment #16)

> I've pushed two branches, one for libmbim named
> "aleksander/function-error-in-transaction" and another one in ModemManager
> named "aleksander/mbim-not-open-3".

I don't see the 'mbim-not-open-3' branch:

http://cgit.freedesktop.org/ModemManager/ModemManager/log/?h=aleksander/mbim-not-open-3

Could you double check if you pushed it correctly?

Tore
Comment 18 Aleksander Morgado 2014-12-29 13:14:30 UTC
(In reply to Tore Anderson from comment #17)
> (In reply to Aleksander Morgado from comment #16)
> 
> > I've pushed two branches, one for libmbim named
> > "aleksander/function-error-in-transaction" and another one in ModemManager
> > named "aleksander/mbim-not-open-3".
> 
> I don't see the 'mbim-not-open-3' branch:
> 
> http://cgit.freedesktop.org/ModemManager/ModemManager/log/?h=aleksander/mbim-
> not-open-3
> 
> Could you double check if you pushed it correctly?
> 
> Tore

Sorry, pushed now.
Comment 19 Tore Anderson 2014-12-29 16:09:12 UTC
Clear signs of progress now, I can connect to mobile broadband after resuming, but I still couldn't send an SMS.

16:15:12 Booted system
16:17:21 Test SMS - OK
16:18:57 Suspended system
16:45:36 Resumed system
16:49:38 Test SMS - fails: error: couldn't send the SMS: 'GDBus.Error:org.freedesktop.libmbim.Error.Protocol.NotOpened: Couldn't send SMS part: MBIM protocol error: NotOpened'
16:57:33 Activating WWAN connection from NM - OK
16:58:09 Deactivated WWAN
16:58:25 Stopped MM
Comment 20 Tore Anderson 2014-12-29 16:09:31 UTC
Created attachment 111473 [details]
MM debug log
Comment 21 Tore Anderson 2014-12-29 16:09:50 UTC
Created attachment 111474 [details]
mbim-proxy verbose output
Comment 22 Aleksander Morgado 2014-12-29 16:24:43 UTC
(In reply to Tore Anderson from comment #19)
> Clear signs of progress now, I can connect to mobile broadband after
> resuming, but I still couldn't send an SMS.
> 
> 16:15:12 Booted system
> 16:17:21 Test SMS - OK
> 16:18:57 Suspended system
> 16:45:36 Resumed system
> 16:49:38 Test SMS - fails: error: couldn't send the SMS:
> 'GDBus.Error:org.freedesktop.libmbim.Error.Protocol.NotOpened: Couldn't send
> SMS part: MBIM protocol error: NotOpened'
> 16:57:33 Activating WWAN connection from NM - OK
> 16:58:09 Deactivated WWAN
> 16:58:25 Stopped MM

I think that could actually be the expected result. The thing here is that ModemManager will, *as soon as it detects* that the MBIM device is not open, fully reprobe it. The key point here is the "as soon as it detects" it. In your test, you tried to send an SMS after resuming from suspend, and it was in that same moment when ModemManager removed the modem and re-created it from scratch (remember that the state of the modem was reseted). So, there will be at least 1 action which will error out with "Not Opened": this can either be a user action (like your SMS) or another automatic action (like a signal quality check request).

If you confirm that *after* seeing the modem recreated by MM you can re-connect and re-send SMS, then I think we can conclude this is fixed.
Comment 23 Tore Anderson 2014-12-29 16:32:05 UTC
I did some more testing, but only with WWAN connections this time:

17:17:03 Started MM
17:18:30 Activated WWAN - OK
17:18:44 Disconnected WWAN - OK
17:19:09 Suspended system
17:19:18 Resumed system
17:20:40 Activated WWAN - failed (failed to connect modem: MBIM protocol error: NotOpened)
17:21:34 Activated WWAN - failed (failed to connect modem: Message did not receive a reply (timeout by message bus)
17:21:39 ModemManager segfaults and is restarted by systemd
17:22:20 Activated WWAN - failed (failed to connect modem: MaxActivatedContexts)
17:23:10 Activated WWAN - failed (failed to connect modem: MaxActivatedContexts)
17:23:19 Activated WWAN - failed (failed to connect modem: MaxActivatedContexts)
17:26:01 Stopped ModemManager

Anyway, from my "dumb user" point of view I think it would be ideal that these NotOpened errors never propagate up so that my user-requested actions (whether that is sending an SMS or connecting to WWAN) never fail because of them. While I certainly can live with having to click the connect button twice in order to get connected after resume, it would be better if I only had to click it once (and possibly have to wait a little bit longer while the modem was re-initialised if necessary).
Comment 24 Tore Anderson 2014-12-29 16:32:50 UTC
Created attachment 111476 [details]
MM debug log - WWAN-only testing
Comment 25 Tore Anderson 2014-12-29 16:33:15 UTC
Created attachment 111477 [details]
mbim-proxy verbose output - WWAN-only testing - pre-segfault
Comment 26 Tore Anderson 2014-12-29 16:33:51 UTC
Created attachment 111478 [details]
mbim-proxy verbose output - WWAN-only testing - post-segfault
Comment 27 Aleksander Morgado 2014-12-29 16:39:54 UTC
(In reply to Tore Anderson from comment #23)
> I did some more testing, but only with WWAN connections this time:
> 
> 17:17:03 Started MM
> 17:18:30 Activated WWAN - OK
> 17:18:44 Disconnected WWAN - OK
> 17:19:09 Suspended system
> 17:19:18 Resumed system
> 17:20:40 Activated WWAN - failed (failed to connect modem: MBIM protocol
> error: NotOpened)
> 17:21:34 Activated WWAN - failed (failed to connect modem: Message did not
> receive a reply (timeout by message bus)
> 17:21:39 ModemManager segfaults and is restarted by systemd
> 17:22:20 Activated WWAN - failed (failed to connect modem:
> MaxActivatedContexts)
> 17:23:10 Activated WWAN - failed (failed to connect modem:
> MaxActivatedContexts)
> 17:23:19 Activated WWAN - failed (failed to connect modem:
> MaxActivatedContexts)
> 17:26:01 Stopped ModemManager
> 
> Anyway, from my "dumb user" point of view I think it would be ideal that
> these NotOpened errors never propagate up so that my user-requested actions
> (whether that is sending an SMS or connecting to WWAN) never fail because of
> them. While I certainly can live with having to click the connect button
> twice in order to get connected after resume, it would be better if I only
> had to click it once (and possibly have to wait a little bit longer while
> the modem was re-initialised if necessary).

We currently cannot make that action succeed with the current codebase. The action errors out with NotOpened, and we end up fully recreating the Modem object. The newly created modem knows anything about that action which was sent to that other modem object (which had actually a different DBus object path). The newly created modem object has nothing to do with the old modem object... So either we just propagate the NotOpened error as we do now, or we end up timing out that connection attempt.

The MaxActivatedContexts is yet another thing to consider I think... Could be that the modem is connected at some point but ModemManager doesn't assume that and tries to re-connect again.

And the segfault is very unfortunate; I wonder if you could get some backtrace just of that one. Anyway, I think I'll need to spend some more time reviewing the logic. The fact that I cannot reproduce this, even if I have a Ericsson MBIM modem in my thinkpad, is also unfortunate :/
Comment 28 Tore Anderson 2014-12-29 16:51:15 UTC
(In reply to Aleksander Morgado from comment #22)
> this can either be a user action (like your SMS) or another automatic action
> (like a signal quality check request).

FWIW I don't think the automatic signal quality checks are sufficient to trigger the re-initialisation of the modem. When I suspend/resume without doing any "user actions" afterwards, all I see in the log is this:

<debug> Signal quality value not updated in 60s, marking as not being recent

I'll try to get a backtrace of the segfault...
Comment 29 Tore Anderson 2014-12-29 19:27:53 UTC
Here's a backtrace of the crash. I started ModemManager, suspended, resumed, then tried to up a WWAN connection which failed with "Error: Connection activation failed: Active connection removed before it was initialized" from nmcli, then I tried to up it again, and MM segfaulted (the nmcli command timed out after 90 secs).

[root@envy ~]# gdb /usr/sbin/ModemManager 
GNU gdb (GDB) Fedora 7.8.1-36.fc21
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/ModemManager...Reading symbols from /usr/lib/debug/usr/sbin/ModemManager.debug...done.
done.
(gdb) run
Starting program: /usr/sbin/ModemManager 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff4e92700 (LWP 14479)]
ModemManager[14475]: <info>  ModemManager (version 1.5.0-0.2.fc21) starting in system bus...
[New Thread 0x7fffeffff700 (LWP 14480)]
[New Thread 0x7fffef7fe700 (LWP 14481)]
ModemManager[14475]: opening device...
ModemManager[14475]: cannot connect to proxy: Could not connect: Connection refused
ModemManager[14475]: spawning new mbim-proxy (try 1)...
Detaching after fork from child process 14483.
ModemManager[14475]: [/dev/cdc-wdm1] Read max control message size from descriptors file: 512
ModemManager[14475]: <info>  Creating modem with plugin 'Ericsson MBM' and '7' ports
ModemManager[14475]: <info>  Modem for device at '/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.5' successfully created
ModemManager[14475]: opening device...
ModemManager[14475]: [/dev/cdc-wdm1] Read max control message size from descriptors file: 512
ModemManager[14475]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:19.0': not supported by any plugin
ModemManager[14475]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00:1c.3/0000:03:00.0': not supported by any plugin
ModemManager[14475]: <info>  Modem: state changed (unknown -> disabled)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: state changed (disabled -> enabling)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: 3GPP Registration state changed (unknown -> searching)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: state changed (enabling -> searching)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: 3GPP Registration state changed (searching -> registering)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: 3GPP location updated (MCC: '242', MNC: '1', Location area code: '0', Cell ID: '0')
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: 3GPP Registration state changed (registering -> home)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: state changed (searching -> registered)
[Thread 0x7fffeffff700 (LWP 14480) exited]
ModemManager[14475]: <info>  Creating modem with plugin 'Generic' and '1' ports
ModemManager[14475]: <warn>  Could not grab port (tty/ttyS0): 'Cannot add port 'tty/ttyS0', unhandled serial type'
ModemManager[14475]: <warn>  Couldn't create modem for device at '/sys/devices/pci0000:00/0000:00:16.3': Failed to find primary AT port
ModemManager[14475]: <info>  Simple connect started...
ModemManager[14475]: <info>  Simple connect state (4/8): Wait to get fully enabled
ModemManager[14475]: <info>  Simple connect state (5/8): Register
ModemManager[14475]: <info>  Simple connect state (6/8): Bearer
ModemManager[14475]: <info>  Simple connect state (7/8): Connect
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: state changed (registered -> connecting)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: state changed (connecting -> connected)
ModemManager[14475]: <info>  Simple connect state (8/8): All done
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: state changed (connected -> disconnecting)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: state changed (disconnecting -> registered)
ModemManager[14475]: <info>  Simple connect started...
ModemManager[14475]: <info>  Simple connect state (4/8): Wait to get fully enabled
ModemManager[14475]: <info>  Simple connect state (5/8): Register
ModemManager[14475]: <info>  Simple connect state (6/8): Bearer
ModemManager[14475]: <info>  Simple connect state (7/8): Connect
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: state changed (registered -> connecting)
ModemManager[14475]: <warn>  MBIM device is not opened, reporting invalid modem...
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/0: state changed (connecting -> registered)
[New Thread 0x7fffeffff700 (LWP 14831)]
ModemManager[14475]: opening device...
ModemManager[14475]: [/dev/cdc-wdm1] Read max control message size from descriptors file: 512

(ModemManager:14475): GLib-GObject-WARNING **: invalid unclassed pointer in cast to 'MMIfaceModem3gpp'

(ModemManager:14475): GLib-GObject-CRITICAL **: g_object_get: assertion 'G_IS_OBJECT (object)' failed

(ModemManager:14475): GLib-GObject-WARNING **: invalid unclassed pointer in cast to 'MMIfaceModem3gpp'

(ModemManager:14475): GLib-GObject-WARNING **: invalid unclassed pointer in cast to 'GObject'

(ModemManager:14475): GLib-GObject-CRITICAL **: g_object_get_qdata: assertion 'G_IS_OBJECT (object)' failed

(ModemManager:14475): GLib-GObject-WARNING **: invalid unclassed pointer in cast to 'GObject'

(ModemManager:14475): GLib-GObject-CRITICAL **: g_object_set_qdata_full: assertion 'G_IS_OBJECT (object)' failed

(ModemManager:14475): GLib-GObject-CRITICAL **: g_object_get: assertion 'G_IS_OBJECT (object)' failed

(ModemManager:14475): GLib-GObject-WARNING **: invalid unclassed pointer in cast to 'MMIfaceModem'

(ModemManager:14475): GLib-GObject-CRITICAL **: g_object_get: assertion 'G_IS_OBJECT (object)' failed
ModemManager[14475]: <info>  Creating modem with plugin 'Ericsson MBM' and '7' ports
ModemManager[14475]: <info>  Modem for device at '/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.5' successfully created
ModemManager[14475]: opening device...
ModemManager[14475]: [/dev/cdc-wdm1] Read max control message size from descriptors file: 512
ModemManager[14475]: <info>  Modem: state changed (unknown -> disabled)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/1: state changed (disabled -> enabling)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/1: 3GPP Registration state changed (unknown -> registering)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/1: 3GPP Registration state changed (registering -> home)
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/1: 3GPP location updated (MCC: '242', MNC: '1', Location area code: '0', Cell ID: '0')
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/1: state changed (enabling -> registered)

(ModemManager:14475): GLib-GObject-WARNING **: invalid unclassed pointer in cast to 'MMIfaceModem'

(ModemManager:14475): GLib-GObject-CRITICAL **: g_object_get: assertion 'G_IS_OBJECT (object)' failed
[Thread 0x7fffeffff700 (LWP 14831) exited]
ModemManager[14475]: <info>  Simple connect started...
ModemManager[14475]: <info>  Simple connect state (4/8): Wait to get fully enabled
ModemManager[14475]: <info>  Simple connect state (5/8): Register
ModemManager[14475]: <info>  Simple connect state (6/8): Bearer
ModemManager[14475]: <info>  Simple connect state (7/8): Connect
ModemManager[14475]: <info>  Modem /org/freedesktop/ModemManager1/Modem/1: state changed (registered -> connecting)

(ModemManager:14475): GLib-GObject-CRITICAL **: g_object_get: assertion 'G_IS_OBJECT (object)' failed

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7152eb5 in g_type_check_instance_is_fundamentally_a (type_instance=type_instance@entry=0x41, fundamental_type=fundamental_type@entry=80) at gtype.c:3979
3979	  if (!type_instance || !type_instance->g_class)
(gdb) bt full
#0  0x00007ffff7152eb5 in g_type_check_instance_is_fundamentally_a (type_instance=type_instance@entry=0x41, fundamental_type=fundamental_type@entry=80) at gtype.c:3979
        node = <optimized out>
#1  0x00007ffff7134bd7 in g_object_unref (_object=0x41) at gobject.c:3067
        _g_boolean_var_ = <optimized out>
        object = 0x41
#2  0x00005555555caa2b in device_notification_cb (notification=0x5555558d8720, self=0x5555558a0310) at mm-broadband-modem-mbim.c:1730
        session_id = 0
        activation_state = MBIM_ACTIVATION_STATE_ACTIVATING
        context_type = 0x5555558daecc
        bearer_list = 0x41
        service = MBIM_SERVICE_BASIC_CONNECT
        __FUNCTION__ = "device_notification_cb"
#3  0x00005555555caa2b in device_notification_cb (notification=0x5555558d8720, self=0x5555558a0310) at mm-broadband-modem-mbim.c:1838
        service = MBIM_SERVICE_BASIC_CONNECT
        __FUNCTION__ = "device_notification_cb"
#4  0x00005555555caa2b in device_notification_cb (device=<optimized out>, notification=0x5555558d8720, self=0x5555558a0310) at mm-broadband-modem-mbim.c:1973
        service = MBIM_SERVICE_BASIC_CONNECT
        __FUNCTION__ = "device_notification_cb"
#5  0x00007ffff7132d13 in g_cclosure_marshal_VOID__BOXEDv (closure=0x5555558c9270, return_value=<optimized out>, instance=<optimized out>, args=<optimized out>, marshal_data=<optimized out>, n_params=<optimized out>, param_types=0x55555587e120) at gmarshal.c:1160
        cc = 0x5555558c9270
        data1 = <optimized out>
        data2 = <optimized out>
        callback = <optimized out>
        arg0 = 0x5555558d8720
        args_copy = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7fffffffe180, reg_save_area = 0x7fffffffe0c0}}
#6  0x00007ffff712ff64 in _g_closure_invoke_va (closure=closure@entry=0x5555558c9270, return_value=return_value@entry=0x0, instance=instance@entry=0x7fffe8004f80, args=args@entry=0x7fffffffe0a0, n_params=<optimized out>, param_types=0x55555587e120) at gclosure.c:831
        marshal = <optimized out>
        marshal_data = <optimized out>
        in_marshal = 0
        real_closure = 0x5555558c9250
        __FUNCTION__ = "_g_closure_invoke_va"
#7  0x00007ffff7149b60 in g_signal_emit_valist (instance=0x7fffe8004f80, signal_id=<optimized out>, detail=0, var_args=var_args@entry=0x7fffffffe0a0) at gsignal.c:3218
        return_accu = 0x0
        accu = 
              {g_type = 0, data = {{v_int = 0, v_uint = 0, v_long = 0, v_ulong = 0, v_int64 = 0, v_uint64 = 0, v_float = 0, v_double = 0, v_pointer = 0x0}, {v_int = 0, v_uint = 0, v_long = 0, v_ulong = 0, v_int64 = 0, v_uint64 = 0, v_float = 0, v_double = 0, v_pointer = 0x0}}}
        accumulator = 0x0
        emission = {next = 0x0, instance = 0x7fffe8004f80, ihint = {signal_id = 19, detail = 0, run_type = G_SIGNAL_RUN_FIRST}, state = EMISSION_RUN, chain_type = 93824995638656}
        signal_id = 19
        instance_type = 93824995638656
        emission_return = 
              {g_type = 0, data = {{v_int = 0, v_uint = 0, v_long = 0, v_ulong = 0, v_int64 = 0, v_uint64 = 0, v_float = 0, v_double = 0, v_pointer = 0x0}, {v_int = 0, v_uint = 0, v_long = 0, v_ulong = 0, v_int64 = 0, v_uint64 = 0, v_float = 0, v_double = 0, v_pointer = 0x0}}}
        rtype = 4
        static_scope = 0
        fastpath_handler = <optimized out>
        closure = 0x5555558c9270
        run_type = <optimized out>
        l = <optimized out>
        fastpath = <optimized out>
        instance_and_params = <optimized out>
        signal_return_type = <optimized out>
        param_values = <optimized out>
        node = <optimized out>
        i = <optimized out>
        n_params = <optimized out>
        __FUNCTION__ = "g_signal_emit_valist"
#8  0x00007ffff714a3af in g_signal_emit (instance=instance@entry=0x7fffe8004f80, signal_id=<optimized out>, detail=detail@entry=0) at gsignal.c:3365
        var_args = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7fffffffe180, reg_save_area = 0x7fffffffe0c0}}
#9  0x00007ffff76fc200 in indication_ready (self=0x7fffe8004f80 [MbimDevice], res=0x7fffe800b0f0) at mbim-device.c:436
        error = 0x0
        indication = <optimized out>
---Type <return> to continue, or q <return> to quit---
#10 0x00007ffff73e3807 in g_simple_async_result_complete (simple=0x7fffe800b0f0 [GSimpleAsyncResult]) at gsimpleasyncresult.c:763
        current_source = 0x5555558dafb0
        current_context = <optimized out>
        __FUNCTION__ = "g_simple_async_result_complete"
#11 0x00007ffff73e3869 in complete_in_idle_cb (data=<optimized out>) at gsimpleasyncresult.c:775
        simple = <optimized out>
#12 0x00007ffff6e30aeb in g_main_context_dispatch (context=0x555555836a00) at gmain.c:3111
        dispatch = 0x7ffff6e2d630 <g_idle_dispatch>
        prev_source = 0x0
        was_in_call = 0
        user_data = 0x7fffe800b0f0
        callback = 0x7ffff73e3860 <complete_in_idle_cb>
        cb_funcs = 0x7ffff711e8c0 <g_source_callback_funcs>
        cb_data = 0x5555558db050
        need_destroy = <optimized out>
        source = 0x5555558dafb0
        current = 0x5555558345b0
        i = 0
#13 0x00007ffff6e30aeb in g_main_context_dispatch (context=context@entry=0x555555836a00) at gmain.c:3710
#14 0x00007ffff6e30e88 in g_main_context_iterate (context=0x555555836a00, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3781
        max_priority = 0
        timeout = 0
        some_ready = 1
        nfds = <optimized out>
        allocated_nfds = 9
        fds = 0x555555896300
#15 0x00007ffff6e311b2 in g_main_loop_run (loop=0x55555583b050) at gmain.c:3975
        __FUNCTION__ = "g_main_loop_run"
#16 0x0000555555577f59 in main (argc=<optimized out>, argv=<optimized out>) at main.c:150
        inner = <optimized out>
        err = 0x0
        name_id = 1
        __FUNCTION__ = "main"
Comment 30 Aleksander Morgado 2014-12-30 08:52:20 UTC
Thanks for the backtrace and logs, it clearly is some missing reference to the modem object somewhere, hopefully in the new code.

Also, I think it's time we end up plugging in ModemManager to PM events; I'll go and do that as well as part of this branch. It's already too much for a stable bugfix release, so it'll go only to git master for now.
Comment 31 Tore Anderson 2014-12-30 09:23:11 UTC
(In reply to Aleksander Morgado from comment #30)
> Thanks for the backtrace and logs, it clearly is some missing reference to
> the modem object somewhere, hopefully in the new code.

Ack, let me know if I can provide more debug info or anything else.

> Also, I think it's time we end up plugging in ModemManager to PM events;
> I'll go and do that as well as part of this branch. It's already too much
> for a stable bugfix release, so it'll go only to git master for now.

That sounds like a sensible thing to do. If you make MM to a resume/thaw event by doing something active that triggers the NotOpened error and subsequent reprobing, then I think that would be as good as it gets.

You could, for example, explicitly probe for network registration status and signal strength. I think it makes sense to do so immediately after resume/thaw even if you don't care aout the NotOpened stuff, because the laptop might have moved a great distance while it's been suspended/hibernated and so the previous information from before the suspend/hibernation might be completely incorrect.

Tore
Comment 32 Aleksander Morgado 2014-12-30 10:14:32 UTC
> > Also, I think it's time we end up plugging in ModemManager to PM events;
> > I'll go and do that as well as part of this branch. It's already too much
> > for a stable bugfix release, so it'll go only to git master for now.
> 
> That sounds like a sensible thing to do. If you make MM to a resume/thaw
> event by doing something active that triggers the NotOpened error and
> subsequent reprobing, then I think that would be as good as it gets.
> 
> You could, for example, explicitly probe for network registration status and
> signal strength. I think it makes sense to do so immediately after
> resume/thaw even if you don't care aout the NotOpened stuff, because the
> laptop might have moved a great distance while it's been
> suspended/hibernated and so the previous information from before the
> suspend/hibernation might be completely incorrect.
> 

That is very true. My only concern here is that in most cases what we do see during a suspend/resume is the modem ports (either USB TTYs or cdc-wdm ports) being removed from the kernel, and re-created. In that case the modem is already probed by default from scratch, so that's good; it's this case you're reporting the only one I know of (if I'm not mistaken) where the control ports aren't removed/recreated during the cycle. So, an explicit signal or registration check is a good idea, but we also need to make sure that these actions get properly cancelled if the ports end up getting removed during the cycle, which is the most common case. I'll try to draft something :)
Comment 33 Aleksander Morgado 2015-01-16 09:01:02 UTC
Tore, could you retry the suspend/resume tests with libmbim 1.12 and the 'aleksander/suspend-resume' branch in ModemManager git?
Comment 34 Aleksander Morgado 2015-01-16 12:49:58 UTC
(In reply to Aleksander Morgado from comment #33)
> Tore, could you retry the suspend/resume tests with libmbim 1.12 and the
> 'aleksander/suspend-resume' branch in ModemManager git?

BTW, please configure the branch using --with-suspend-resume=systemd (or --with-suspend-resume=upower if you're not using systemd).
Comment 35 Tore Anderson 2015-01-17 12:45:50 UTC
Tested it, there is some clear progress but it's not 100% reliable yet. The modem does get re-initialised/discovered automatically upon resume (and assigned a new index), but sometimes this fails and the modem ends up in an defective state. I'll attach debug files when I reproduce this, after the first suspend/resume cycle it is broken, but after another suspend/resume it works again.

Timeline for correlating with the log files:

13:26:54 MM started after initial boot

mmcli -m 0 at this point looks OK.

13:28:31 System suspended
13:28:48 System resumed

mmcli -m 1 at this point does not look OK, most of the information is "unknown" or "none". I did nothing to actively make MM re-discover the modem. Sending SMS fails with "error: modem not enabled yet", and the NetworkManager systray applet indicates that mobile broadband is disabled, and won't allow me to try to establish WWAN connections.

13:31:30 System suspended
13:31:47 System resumed

mmcli -m 2 at this point looks OK again. Again, I did nothing to actively make M
M re-discover the modem.

13:32:34 Test SMS sent - OK!
13:33:09 Estalished WWAN connection through NM - OK!

13:33:35 Shut down MM

Tore
Comment 36 Tore Anderson 2015-01-17 12:47:01 UTC
Created attachment 112386 [details]
ModemManager debug output cf. comment #35
Comment 37 Tore Anderson 2015-01-17 12:47:41 UTC
Created attachment 112387 [details]
mbim-proxy debug output cf. comment #35
Comment 38 Tore Anderson 2015-01-17 12:48:15 UTC
Created attachment 112388 [details]
mmcli -m $idx output cf. comment #35
Comment 39 Aleksander Morgado 2015-01-17 15:05:29 UTC
Looking better, yes.

The problem in this last try you did was that mbim-proxy didn't detect the not-opened issue until the first message was sent from MM. At this point, it probably is worth to update MBIM port probing in MM so that not only open/close is done (which doesn't send any message to the modem when proxy used), but also a real command message, which never gets filtered out in the proxy. With that, and some retries logic, the detection of the need of a reopen should be almost instant. Let me prepare something...
Comment 40 Aleksander Morgado 2015-01-17 18:39:07 UTC
Ok, I was finally able to setup an environment to test this myself. My test setup is a Telit LN930 in MBIM mode connected via an externally powered minipci<->usb adapter. When I set the PC to suspend the power adapter of the modem is still on. I then just need to disconnect the USB and reconnect it before resuming from suspend, and I'll get NotOpened errors right away.

I developed some additional fixes in libmbim, so that when a client comes with a new Proxy Config Set request, we'll explicitly check if the session is open, using a Device Caps Query message. This request will get a NotOpened error and we'll silently re-open the device internally.

Please test branches:
 * libmbim "aleksander/internal-open-checks"
 * ModemManager "aleksander/suspend-resume"
Comment 41 Tore Anderson 2015-01-18 11:34:00 UTC
(In reply to Aleksander Morgado from comment #40)

> Please test branches:
>  * libmbim "aleksander/internal-open-checks"
>  * ModemManager "aleksander/suspend-resume"

Done. Couldn't break these. Booted up, did a 5-6 suspend/resume cycles. The modem was re-initialised correctly after each resume, and was confirmed to be operational (e.g. I could send SMS or connect to WWAN) after each cycle.

Let me know if you're interested in debug logs to confirm anything, but as far as I'm concerned this bug could be closed after these branches are merged onto master. Thanks! :-)

Tore
Comment 42 Aleksander Morgado 2015-01-18 19:45:26 UTC
(In reply to Tore Anderson from comment #41)
> (In reply to Aleksander Morgado from comment #40)
> 
> > Please test branches:
> >  * libmbim "aleksander/internal-open-checks"
> >  * ModemManager "aleksander/suspend-resume"
> 
> Done. Couldn't break these. Booted up, did a 5-6 suspend/resume cycles. The
> modem was re-initialised correctly after each resume, and was confirmed to
> be operational (e.g. I could send SMS or connect to WWAN) after each cycle.
> 
> Let me know if you're interested in debug logs to confirm anything, but as
> far as I'm concerned this bug could be closed after these branches are
> merged onto master. Thanks! :-)
> 

yey!

Dan, any comment on the suspend-resume branch? I'm not sure the upower backend should be supported any more, as the Suspending/Resuming signals were long gone from upowerd's DBus interfaces...
Comment 43 Aleksander Morgado 2015-02-23 11:35:11 UTC
This was merged to git master already.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.