Microsoft IT outage: How firms can avoid disruptions in future

Click here to view original web page at www.businessdailyafrica.com
IT support engineer working in a dark server room. Shutterstock

Following the tech outage that hit Microsoft systems a little over a week ago causing massive and far-reaching disruptions across global corporate firms, Information Technology specialists are exploring fallback options that businesses can adopt as a shield against future incidents.

Those who spoke to the Business Daily acknowledged that even though an outage of this nature does not pose a critical risk to companies' network systems in terms of data loss and security breach, denial of service and the attendant backlogs due to delays in processing transactions poses reputational damage and financial loss risks.

The techies stressed on the need for firms to set up in-house backup infrastructure to minimise downtime.

Related

“The issue in this case was caused by an antivirus software ‘faulty update’ which was rolled back by CrowdStrike, a cybersecurity firm that provides antivirus software to Microsoft for its Windows devices,” noted Philip Oyier, an IT lecturer at the Jomo Kenyatta University of Agriculture and Technology.

Read: Airlines, media, telecoms hit by global cyber outage

Hybrid back-up system

“Since Microsoft is a core infrastructure for many companies because of the Azure cloud platform, a hybrid back-up solution is the best, with om-premise primary backup infrastructure and replication on secondary private cloud-based back-up,” he said.

“This can be configured to deliver data access quickly in an emergency, while providing a secure repository of all kinds of data, databases, virtual machines and applications,” he added.

Oyier’s sentiments are echoed by Nairobi-based software engineer Gathirwa Irungu who adds that having a well-documented and tested disaster recovery plan could enable swift action during outages, hence minimising the impact of the downtime.

Proactive monitoring

Additionally, he states, maintaining active support contracts with software vendors as well as engaging security firms for proactive monitoring can facilitate rapid troubleshooting and resolution.

“Employing system redundancy, with failover mechanisms and load balancers, ensures that critical operations can continue even if some systems go down,” stated Mr Irungu.

“Maintaining a regular cloud backup routine also ensures that data can be quickly restored in case of a system failure. In addition to that, companies should implement robust update management practices such as staged rollouts and periodic thorough testing, which can prevent faulty updates from causing widespread issues.”

Proper staff skilling

Chief Technology Officer at travel technology firm Mobiticket Brian Oroni emphasised on staff preparedness through proper skilling, noting that this could go a long way in alleviating a resultant crisis within a relatively short period of time.

Read: Explainer: What caused the global cyber outage?

“If you are a critical service provider like banks, airlines, medical institutions and media outlets, you need to have a risk management team that is on standby and full-time available. This team will actively work with the provider to solve any emerging issue within the shortest time possible,” said Mr Oroni.

“This is time consuming and expensive for providers with many computers but if a few units are returned back to service as soon as possible the business outage will be minimised.”

Multiple operating systems

Other potential coping mechanisms that the pundits recommend include the use of multiple operating systems noting that such diversification can reduce the risk of failure in a single system impacting all organisational operations and this in turn enhances a firm’s overall resilience.

“This multi-OS (operating systems) approach can help ensure that if one system encounters issues, like the Microsoft OS did, others like Linux OS can continue to function, maintaining business continuity,” said Mr Irungu.

On the penultimate Friday of this month, the worldwide Microsoft glitch spread its disruption effects across multiple industries grounding banking and healthcare systems, halting airline flights while some broadcasters went off-air.

The aviation sector was hit particularly hard due to its sensitivity to timings, with major airlines including national carrier Kenya Airways reporting delays and flight disruptions.

In an update during the course of the day, Microsoft said users were unable to access various Office 365 apps and services due to a “configuration change in a portion of our Azure-backed workloads”.

An alert sent by CrowdStrike to its clients said the company’s “Falcon Sensor” software was causing Microsoft Windows to crash and display a blue screen, known informally as the “Blue Screen of Death”.

→ kmwangi@ke.nationmedia.com