Managed IT Lessons Learned from the Crowdstrike Incident

On the 19th of July, 2024, an incident involving the company Crowdstrike affected information technology systems across the world. Crowdstrike manages IT security services for many multinational corporations with significant risk exposure. Within hours of Crowdstrike pushing an update to its customers, systems ended up in a loop of restarts and crashes without end.

Although a solution eventually arrived, the incident caused billions of dollars in damage. What can IT support managed services customers learn from the Crowdstrike incident? Here are eight issues that deserve heightened awareness.

1. Know Your Managed Services Update Schedule

One of the worst aspects of the Crowdstrike incident was its timing. The services provider rolled out an update on a Friday, creating an incident that caused outages for everyone from banks to airlines that stretched into the weekend. While the faulty update was an accident, it caused more damage than the best hackers could dream of. Worse, the timing meant that IT teams had to labor over the weekend to implement the eventual fix.

To be clear, the remote update process is one of the major benefits of the IT support managed services model. After all, pushing updates to tens, hundreds or even thousands of systems by hand is not feasible at many organizations. However, the scheduling of updates needs to make sense for your operation. You don’t want an incident to hit when you’re also at peak demand.

2. Onsite Support Matters

Many of the affected systems required manual updates in the Crowdstrike incident. While the optimal version of the managed model should require few in-person visits, you want to know that your services provider is capable of putting technicians onsite. For example, Zetta IT solutions offers everything from remote monitoring to onsite solutions. The company is one of the top managed service providers in Perth. Customers get the best of both worlds because remote solutions keep costs down while onsite availability provides maximum assurance.

3. Not All Hazards Are External

In the IT world, people often fall into the trap of only looking outwards for potential hazards. With Crowdstrike, however, the incident happened due to an update. In other words, the update was normal activity involving a third-party vendor, not a hostile actor.

4. Know How Deep the IT Support Managed Services Go

Crowdstrike is known for using an especially aggressive level of cybersecurity, even by the standards of hardcore security people. The affected systems in the incident involved Microsoft Windows computers. However, the updates from Crowdstrike affect systems at the operating system level. In other words, the updates involve software that can modify the OS in ways that Microsoft would never warrant.

Managed IT customers should understand how deep the system goes. While the Crowdstrike model is unusual, it isn’t unknown. If there is a risk of unexpected modifications to the system, you need to know that before ever signing a contract.

5. Transparency and Disclosure Matter

One thing Crowdstrike did perfectly was to be as transparent as possible. The company quickly disclosed the nature and causes of the incident. It also took full responsibility for what happened. Consequently, its customers were able to quickly take essential systems offline. While there was financial damage, no physical harm appears to have come from the update incident.

Monitoring is a critical part of operating a managed IT infrastructure. However, monitoring is only good if there’s transparency. A good managed services firm, such as Zetta IT solutions, should keep the customer in the loop about everything. Even if the company is dealing with a small update, notifications and transparency build trust and help you make decisions.

6. Have a Response Plan

Always have a response plan. Even the best managed IT providers on Earth can’t do it all on their own. Know who to contact at your service provider. Have multiple points of contact. Make sure there are designated people at your organization who’ll handle these contacts. Also, make sure at least one of them is around. You don’t want to find out that both of your people who know the managed IT system went on vacation on the same day.

Have a plan for notifying all stakeholders within your business. Know who is supposed to contact the stakeholders and under what circumstances.

Also, know your service provider’s response plan. Discuss it in detail. For example, what is the process for shutting down systems to prevent an incident from getting out of hand? Does the service provider have a plan for restoring systems from backups? Which conditions would trigger onsite support?

7. Consider Delaying Some Updates

A common solution in the IT business to this type of problem is to delay all non-critical updates, especially if they involve software from outside parties. For example, you might delay your Windows updates for at least a few days just to make sure a bad one doesn’t slip through without sufficient quality assurance. You can make exceptions in the system policies for critical updates, such as ones that patch major vulnerabilities.

8. Resiliency Matters

One reason the Crowdstrike incident didn’t cause more trouble is that the customers’ systems were resilient. While the incident might have shut down some servers, terminals and computers, it didn’t lead to notable data losses. Although the systems were unavailable, the data was intact thanks to backups and other redundancies. When a patch for the update was available and testing was done, the systems were able to resume normal operation.

Comments are closed.