Cable / Telecom News

Measures taken by Rogers to improve network resiliency and reliability after July 2022 outage are satisfactory: report


The combination of measures that Rogers undertook after its major service outage in July 2022 are satisfactory to improve the Rogers network resiliency and reliability as well as to address the root cause of the outage, according to a report by Xona Partners Inc. released last week by the CRTC.

The report, commissioned by the CRTC via a contract awarded in 2023, details the results of an independent assessment of the Rogers network architecture for reliability and resiliency, as well as the processes in place at Rogers to manage network changes and respond to network incidents like outages.

To address the July 2022 outage root cause and deficiencies in its management network architecture, some of the measures taken by Rogers include implementing safeguards in the configuration of its core network routers to prevent in future the flooding of IP routing data that triggered the outage due to an error in the configuring of the distribution routers within the Rogers IP network.

Rogers has also implemented a separate physical and logical management network to access network elements for troubleshooting and root cause analysis. In addition, Rogers has deployed backup connectivity from third-party service providers to its network operation centre and other critical remote infrastructure sites, and invested in tools that would help validate router configuration changes, the report says.

Furthermore, Rogers has made improvements to its change management and incident management processes. Improvements to the change management process include: a new risk assessment algorithm; organizational changes to improve collaboration between network operations and engineering teams; an enhanced process for introducing new equipment and technology; improvements in implementing network changes such as introducing automation to streamline the change management process; and additional lab testing of planned network configuration changes.

Incident management process improvements include: bolstering Rogers’s incident management guidelines to encompass various outage scenarios; streamlining its incident response with well-defined leadership roles; implementing a solution for prioritization of alarms during outages; enhancing automated rollbacks to previous configurations when new changes are not successful; and implementing additional measures to improve its communication protocols. Rogers has also equipped all of its incident response and crisis management team members with backup communications from third-party service providers to maintain communication capabilities during outages.

Following the July 2022 outage, Rogers announced it would separate the IP core network for its wireless and wireline networks, thereby ensuring one IP core network would remain operational if the other were affected by an outage. The 2022 outage affected all of Rogers’s services because its wireless and wireline networks shared a common core IP network. The Xona Partners report says Rogers has not yet finalized the implementation of the IP core network separation, which remains a work in progress.

While the measures Rogers has taken so far are satisfactory, the report says, it offers several recommendations for additional measures Rogers could undertake to further improve its network resilience, including: testing emergency roaming with other mobile network operators and expand it to include a more comprehensive set of test scenarios; developing a detailed root cause analysis for future major outages; ensuring wide coverage and rigour in testing configuration changes; expanding the scope of incident management drills; institutionalizing learning from its own and other service providers’ network failures to implement preventive actions, minimize the impact of network outages, and enhance quality of service; informing customers how to reach 911 services during an outage; and sharing outage root cause and mitigation strategies with the broader internet community (represented by bodies such as the North American Network Operators’ Group), to help other telecom network operators prevent similar network failures.