Incident Resolution and Future Plans
On the morning of 25/03, we encountered issues with our SIP server as multiple extensions were unable to register due to a memory leak. After migrating to our secondary server and rebooting the main server, a deadlock situation arose, preventing either server from providing services. To address this, we performed a quick reboot of both servers, with our Disaster Recovery solution effectively redirecting inbound calls during the process.
Upon bringing the SIP servers back online, we encountered a recurrence of the issues due to an overwhelming number of BLF requests bombarding the server, causing system instability. This highlighted the need for a more robust solution to handle such scenarios in the future.
We are currently in the process of updating the SIP server and transitioning to a more advanced version of our redundancy software, with a particular focus on efficiently handling BLF requests. As a result, the new BLF approach and the updated SIP servers will take precedence as critical projects, undergoing extensive testing and meticulous attention.
We plan to implement these updates on a Saturday night after 10PM to minimize potential minor instability during the transition, and we will provide advance notification before proceeding. We sincerely apologize for the inconvenience caused and assure you that we are leveraging this experience to enhance the redundancy and reliability of our systems, delivering a solution that meets your expectations.