Modem Lock-up Issue

Home Forums MultiConnect OCG Modem Lock-up Issue

Tagged: , , ,

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • #6104
    Mark Gajdosik
    Participant

    Hello,

    we have a rare but serious issue with the wireless modem in MTCDP-H5’s. I will attempt to describe the issue in the clearest possible way:

    1. The modem is working fine for a number of hours/days. PPPD is using /dev/modem_at0. Internet connection is tested by an occasional ping (every 5 min). Once a minute, we check /dev/modem_at1 for incoming text messages. Every 4 hours, we issue AT+COPS to re-register with the network.

    2. Suddenly, checking for the text messages fails. This varies from unit to unit, but it is mostly caused by

    • AT+CMGL timing-out (after 60 seconds)
    • Input/output error (5) when sending the AT+CMGL command
    • /dev/modem_at1 completely disappearing from the system

    3. At the same time, the internet connection drops (ping failures). This puts the unit offline.

    4. In this situation, our software will do the following:

        Attempt to re-open /dev/modem_at1 10x once a minute before issuing radio-reset-h5, then another 10x before requesting a system reboot.
        Restart PPPD every 3 minutes and wait to see if a route through the PPP interface appears.

    This procedure works well, but on some units, the modem ports in /dev never come back (unless we power-cycle).

    5. When checking the dmesg output, we can see that the modem identified as:

    usb 1-1: new full speed USB device using at91_ohci and address 2
    usb 1-1: New USB device found, idVendor=1bc7, idProduct=0021
    usb 1-1: New USB device strings: Mfr=18, Product=19, SerialNumber=20
    usb 1-1: Product: Telit Wireless Module
    usb 1-1: Manufacturer: Telit wireless solutions
    usb 1-1: SerialNumber: 351579050569209
    

    disconnects from the USB hosts and, if we are lucky, re-connects back.

    To me, it seems the modem either crashes or freezes.

    How can we approach this issue? Is there any way to power down the modem for a short period of time before powering it back up?

    #6106
    Darrik Spaude
    Keymaster

    Hi Mark,

    I’m going to create a case for you in the Support Portal since this needs more research.

    #7510
    Rafael Hernández
    Participant

    Hello,
    I am having the same issue. What I used to do after detecting the modem’s lock up was to reboot the MTCDP. In a number of cases a soft reboot was not enough to to get the modem back to work. It needs a power down.
    Playing around I discovered a way to recover the modem: perform an old-fashion reset before calling radio-reset-h5:

    
    /usr/sbin/mts-io-sysfs store radio-reset 0
    /bin/sleep 5s
    /usr/sbin/radio-reset-h5
    

    In my devices, I’m still rebooting them instead of issuing a radio-reset-h5.

    #7512
    Mark Gajdosik
    Participant

    Hi Rafael,

    it was suggested to do the same to us by MultiTech. It does seem to overcome the issue, but we have one stubborn site where this solution does not work.

    Would you have any dmesg output from the moment the modem plays up please? It would be helpful to compare the logs to see if the errors manifests in the same way.

    #7538
    Rafael Hernández
    Participant

    I’ll try to catch some for you.
    It is true that the sequence

    old-radio-reset + reboot

    does not always work. In some cases, my mtcdp needs 2-3 retries before detecting the modem again.

    #7568
    Rafael Hernández
    Participant

    Mark, what I see from the logs is this:

    When the modem doesn’t work on startup, it is not because the mtcdp doesn’t recognize it. It does recognize it and sets it up correctly. The problem comes a few seconds after that: there is an entry in the messages log stating

    mtcdp user.info kernel: usb 1-1: USB disconnect, address XX

    and then it is rediscovered again

    mtcdp user.info kernel: usb 1-1: new full speed USB device using at91_ohci and address YY

    The rediscovery process enters an endless loop. Externally you may see that the LS led blinks once every 7 seconds.
    Should this happen, the two instructions

    /usr/sbin/mts-io-sysfs store radio-reset 0
    /usr/sbin/radio-reset-h5

    solve the situation

    #7569
    Rafael Hernández
    Participant

    Digging a little bit deeper /usr/sbin/radio-reset-h5 is a simple script:

    rmmod ohci_hcd
    mts-io-sysfs store radio-reset 0
    sleep 8
    modprobe ohci_hcd

    It already contains the “mts-io-sysfs store radio-reset 0” instruction!

    On the other hand, be careful with the environment when you invoke radio-reset-h5, it assumes you have a PATH correctly initialized (which, for instance, may not be true in tasks triggered by cron).

    /usr/sbin/radio-reset-h5: line 11: rmmod: not found
    /usr/sbin/radio-reset-h5: line 12: mts-io-sysfs: not found
    /usr/sbin/radio-reset-h5: line 14: modprobe: not found

    So, maybe, this is the reason why radio-reset-h5 does not always recover the modem.

    #7570
    Mark Gajdosik
    Participant

    Rafael,

    I do what the script does + the extra radio reset, from inside our application:

    1. Shut down PPPD, OpenVPN.
    2. Stop checking for SMS messages in /dev/modem_at1.
    3. Wait 30 seconds.
    
    4. Write 0 into /sys/devices/platform/mtcdp/radio-reset
    5. Wait 8 seconds.
    6. /sbin/rmmod ohci_hcd
    7. Wait 8 seconds.
    8. Write 0 into /sys/devices/platform/mtcdp/radio-reset
    9. Wait 8 seconds.
    10. /sbin/modprobe ohci_hcd
    
    11. Wait 30 seconds.
    12. Restart PPPD, OpenVPN etc.

    This did help on one of our sites which couldn’t stay online for longer than a day.

    • This reply was modified 9 years, 7 months ago by Mark Gajdosik.
    #7572
    Mark Gajdosik
    Participant

    Regarding the logs, I had the same issue.

    1. The modem disconnects without any warning.
    2. It then reconnects back.
    3. It is assigned an address which it does not accept and the cycle repeats forever, even after a reboot.

    The radio reset script from above helped this issue. We have had the following happening to us as well, with the radio reset script above helping the situation.

    g_serial gadget: Gadget Serial v2.4
    g_serial gadget: g_serial ready
    #### Everything works normal until here ####
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    tty_port_close_start: tty->count = 1 port count = 0.
    #### This is when the radio reset script kicks in ####
    usb 1-1: USB disconnect, address 2
    usb 1-1: new full speed USB device using at91_ohci and address 4
    at91_ohci at91_ohci: remove, state 1
    usb usb1: USB disconnect, address 1
    ftdi_sio ttyUSB0: usb_serial_generic_submit_read_urb - error submitting urb: -19
    usb 1-1: device not accepting address 4, error -62
    hub 1-0:1.0: cannot disable port 1 (err = -19)
    hub 1-0:1.0: cannot reset port 1 (err = -19)
    hub 1-0:1.0: cannot disable port 1 (err = -19)
    hub 1-0:1.0: cannot reset port 1 (err = -19)
    hub 1-0:1.0: cannot disable port 1 (err = -19)
    hub 1-0:1.0: cannot reset port 1 (err = -19)
    hub 1-0:1.0: cannot disable port 1 (err = -19)
    hub 1-0:1.0: unable to enumerate USB device on port 1
    hub 1-0:1.0: cannot disable port 1 (err = -19)
    usb 1-2: USB disconnect, address 3
    ftdi_sio ttyUSB0: FTDI USB Serial Device converter now disconnected from ttyUSB0
    ftdi_sio 1-2:1.0: device disconnected
    at91_ohci at91_ohci: USB bus 1 deregistered
    #### ohci_hcd is modprobed back into the system ####
    ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
    at91_ohci at91_ohci: AT91 OHCI
    at91_ohci at91_ohci: new USB bus registered, assigned bus number 1
    at91_ohci at91_ohci: irq 20, io mem 0x00500000
    usb usb1: New USB device found, idVendor=1d6b, idProduct=0001
    usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
    usb usb1: Product: AT91 OHCI
    usb usb1: Manufacturer: Linux 2.6.35.14+ ohci_hcd
    usb usb1: SerialNumber: at91
    hub 1-0:1.0: USB hub found
    hub 1-0:1.0: 2 ports detected
    usb 1-1: new full speed USB device using at91_ohci and address 2
    usb 1-1: New USB device found, idVendor=1bc7, idProduct=0021
    usb 1-1: New USB device strings: Mfr=18, Product=19, SerialNumber=20
    usb 1-1: Product: Telit Wireless Module
    usb 1-1: Manufacturer: Telit wireless solutions
    usb 1-1: SerialNumber: 351579052169503

    The script doesn’t prevent these things from happening, it is only used as a last resort. Also, we have sites where nothing like this ever happened to us, while other sites are very unstable.

Viewing 9 posts - 1 through 9 (of 9 total)
  • You must be logged in to reply to this topic.