by John Leyden

Senior Writer

CrowdStrike blames testing shortcomings for Windows meltdown

News

24 Jul 20245 mins

Endpoint ProtectionIncident ResponseSecurity

Customers will be given more control over when and where content is downloaded to reduce the risk of similar incidents in future.

Credit: T. Schneider / Shutterstock

CrowdStrike has blamed a hole in its testing software for the release of a defective content update that hobbled millions of Windows computers worldwide on Friday, July 19.

The hole caused CrowdStrike’s Content Validator tool to miss a flaw in an update for the security vendor’s Falcon Sensor endpoint protection technology, causing Windows machines that received the update to crash with the infamous Blue Screen of Death (BSOD) before forcing them into a repetitive boot-loop that left them unusable.

In its preliminary post-incident review, CrowdStrike confirmed that the crashing of its customers’ computers was due to a flaw in Channel File 291, part of a sensor configuration update released to Windows systems at 04:09 UTC on July 19. In the review it provided an initial explanation for how that flaw came to be deployed, and outlined changes it is making to its processes to avoid a repeat.

CrowdStrike isn’t the only organization considering changes in the wake of the incident: Many CIOs are also rethinking their reliance on cloud software like CrowdStrike’s.

(Read what you need to know about the CrowdStrike failure here.)

Testing shortcomings exposed

CrowdStrike’s review described the rigorous testing process it applies to new versions of its software agent and the default data files that accompany them — what it calls Sensor Content — but said that the flaw was in a type of exploit signature update it calls Rapid Response Content, which goes through less-rigorous checks.

Customers have the option of operating with the latest version of Sensor Content, or with either of the two previous versions if they prefer to favor reliability over coverage of the most recent attacks. Rapid Response Content, however, is deployed automatically to compatible sensor versions.

Rapid Response Content is stored in a proprietary binary file that contains configuration data rather than code. The files are delivered as configuration updates to the Falcon sensor, making the platform better able to detect the hallmarks of malicious activity based on behaviour recognition.

CrowdStrike uses its Content Configuration System to create so-called Template Instances describing the behavior to be detected, storing them in Channel Files that it then tests with a tool called the Content Validator.

Countdown to disaster

Falcon Sensor 7.11 was made generally available to customers on February 28, introducing a new type of template to detect novel attack techniques on interprocess communications (IPC) that abuse so-called Named Pipes.

The first Channel File 291 was released to production on March 5 following a successful stress test. Template Instances that relied on Channel File 291 were released without problems on March 5, April 8 and April 24.

Disaster struck when two additional Template Instances were deployed on July 19. “Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data,” CrowdStrike said in its review.

What seemed like a minor configuration update to a component that had been tested and was already in production triggered a wave of crashes. Nevertheless, CrowdStrike argued it acted responsibly in the run-up to what turned out to be disaster.

“Based on the testing performed before the initial deployment of the Template Type (on March 05, 2024), trust in the checks performed in the Content Validator, and previous successful IPC Template Instance deployments, these instances were deployed into production,” CrowdStrike explained in its review.

“When received by the sensor and loaded into the Content Interpreter, problematic content in Channel File 291 resulted in an out-of-bounds memory read triggering an exception. This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash (BSOD),” it added.

Testing improvements

From now on, CrowdStrike updates will be tested locally before being sent to clients. Content update and rollback testing will be carried out and there’ll be additional stability and content interface testing.

Existing error handling procedures in the Content Interpreter will be improved so that, for example, bugs only crash the program rather than triggering an operating system crash.

CrowdStrike will also introduce a staggered deployment strategy for the Rapid Response Content that caused the July 19 incident, it said. It will initially release new content as a “canary deployment” to detect critical issues, then release it to larger and larger portions of its customer base. It will also enable customers to refuse the very latest content releases, offering “granular selection of when and where these updates are deployed,” it said.

Early reaction to CrowdStrike’s analysis and remediation plan from security experts, such as Kevin Beaumont, has been positive.

“CrowdStrike’s response has been really good post error,” Beaumont said in a thread on Twitter/X. “They clearly realise they need to prioritise safety now.”

by John Leyden

Senior Writer

John Leyden is a senior writer for CSO Online. He has written about computer networking and cyber-security for more than 20 years. Prior to the advent of the web, he worked as a crime reporter at a local newspaper in Manchester, UK. John holds an honors degree in electronic engineering from City, University of London.

Show me more

CSO Executive Sessions: How cybersecurity impacts company ratings - A fey factor for investors and consumers

12 Feb 202527 mins

Security

CSO Executive Sessions: Guardians of the Games - How to keep the Olympics and other major events cyber safe

07 Aug 202417 mins

CSO and CISO

CSO Executive Session India with Dr Susil Kumar Meher, Head Health IT, AIIMS (New Delhi)

17 Jul 202417 mins

CSO and CISO

CSO Executive Sessions: How cybersecurity impacts company ratings - A fey factor for investors and consumers

12 Feb 202527 mins

Security

CSO Executive Sessions: How should software solution providers keep themselves and their enterprise clients safe?

26 Jan 202518 mins

Security

CSO Executive Sessions: Open Source Institute’s Eric Nguyen on supply chain risks to critical infrastructure (Part 2)

14 Nov 202415 mins

Critical InfrastructureIT GovernanceSupply Chain

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

CrowdStrike blames testing shortcomings for Windows meltdown

Customers will be given more control over when and where content is downloaded to reduce the risk of similar incidents in future.

Testing shortcomings exposed

Countdown to disaster

Testing improvements

More from this author

60% of cybersecurity pros looking to change employers

The dirty dozen: 12 worst ransomware groups active today

4 key trends reshaping the SIEM market

Password managers under increasing threat as infostealers triple and adapt

24% of vulnerabilities are abused before a patch is available

UK monitoring group to classify cyber incidents on earthquake-like scale

Top 5 ways attackers use generative AI to exploit your systems

21% of CISOs pressured to not report compliance issues

Show me more

8 obstacles women still face when seeking a leadership role in IT

What is risk management? Quantifying and mitigating uncertainty

Chinese APT Silk Typhoon exploits IT supply chain weaknesses for initial access

CSO Executive Sessions: How cybersecurity impacts company ratings - A fey factor for investors and consumers

CSO Executive Sessions: Guardians of the Games - How to keep the Olympics and other major events cyber safe

CSO Executive Session India with Dr Susil Kumar Meher, Head Health IT, AIIMS (New Delhi)

CSO Executive Sessions: How cybersecurity impacts company ratings - A fey factor for investors and consumers

CSO Executive Sessions: How should software solution providers keep themselves and their enterprise clients safe?

CSO Executive Sessions: Open Source Institute’s Eric Nguyen on supply chain risks to critical infrastructure (Part 2)

CrowdStrike blames testing shortcomings for Windows meltdown

Customers will be given more control over when and where content is downloaded to reduce the risk of similar incidents in future.

Testing shortcomings exposed

Countdown to disaster

Testing improvements

From our editors straight to your inbox

More from this author

60% of cybersecurity pros looking to change employers

The dirty dozen: 12 worst ransomware groups active today

4 key trends reshaping the SIEM market

Password managers under increasing threat as infostealers triple and adapt

24% of vulnerabilities are abused before a patch is available

UK monitoring group to classify cyber incidents on earthquake-like scale

Top 5 ways attackers use generative AI to exploit your systems

21% of CISOs pressured to not report compliance issues

Show me more

8 obstacles women still face when seeking a leadership role in IT

What is risk management? Quantifying and mitigating uncertainty

Chinese APT Silk Typhoon exploits IT supply chain weaknesses for initial access

CSO Executive Sessions: How cybersecurity impacts company ratings - A fey factor for investors and consumers

CSO Executive Sessions: Guardians of the Games - How to keep the Olympics and other major events cyber safe

CSO Executive Session India with Dr Susil Kumar Meher, Head Health IT, AIIMS (New Delhi)

CSO Executive Sessions: How cybersecurity impacts company ratings - A fey factor for investors and consumers

CSO Executive Sessions: How should software solution providers keep themselves and their enterprise clients safe?

CSO Executive Sessions: Open Source Institute’s Eric Nguyen on supply chain risks to critical infrastructure (Part 2)