NTA - Network issue – Incident details

All systems operational

Network issue

Resolved
Major outage
Started 9 months agoLasted about 23 hours

Affected

Inbound Calls

Major outage from 1:26 PM to 3:40 PM, Operational from 3:40 PM to 12:38 PM

Outbound Calls

Major outage from 1:26 PM to 3:40 PM, Operational from 3:40 PM to 12:38 PM

Web Interface

Major outage from 1:26 PM to 3:40 PM, Operational from 3:40 PM to 12:38 PM

Busy Lamps

Major outage from 1:26 PM to 12:38 PM

API

Operational from 1:26 PM to 12:38 PM

Secondary Services

Major outage from 1:26 PM to 3:40 PM, Operational from 1:26 PM to 12:38 PM

Updates
  • Resolved
    Resolved

    Incident Resolution and Future Plans

    On the morning of 25/03, we encountered issues with our SIP server as multiple extensions were unable to register due to a memory leak. After migrating to our secondary server and rebooting the main server, a deadlock situation arose, preventing either server from providing services. To address this, we performed a quick reboot of both servers, with our Disaster Recovery solution effectively redirecting inbound calls during the process.

    Upon bringing the SIP servers back online, we encountered a recurrence of the issues due to an overwhelming number of BLF requests bombarding the server, causing system instability. This highlighted the need for a more robust solution to handle such scenarios in the future.

    We are currently in the process of updating the SIP server and transitioning to a more advanced version of our redundancy software, with a particular focus on efficiently handling BLF requests. As a result, the new BLF approach and the updated SIP servers will take precedence as critical projects, undergoing extensive testing and meticulous attention.

    We plan to implement these updates on a Saturday night after 10PM to minimize potential minor instability during the transition, and we will provide advance notification before proceeding. We sincerely apologize for the inconvenience caused and assure you that we are leveraging this experience to enhance the redundancy and reliability of our systems, delivering a solution that meets your expectations.

  • Update
    Update

    We now believe the issue to be resolved but will keep monitoring. An RFO will be issued in due course

  • Update
    Update

    We implemented a fix and are currently monitoring the result. There are a few calls that are stuck and are being cleared as soon as possible

  • Monitoring
    Monitoring

    The system is now back and working, however, the BLF function will remain off until this evening so that it can be brought back online when the traffic is lower and the network has fully stabilised

  • Update
    Update

    Unfortunately the resolution found has not cleared the issue. We are working to resolve the issue and apologise again for any inconvenience

  • Update
    Update

    We are continuing to work on a fix for this incident. The network is slowly returning to normal and hopefully will be fully functioning shortly

  • Identified
    Identified

    The BLF monitoring will shortly be disabled, the connections will then be momentarily dropped and then the servers restarted to resolve the issue

  • Investigating
    Investigating

    We are aware of an issue affecting our network. We are currently working to resolve this and apologise for any issue this is causing you