this post was submitted on 21 Jul 2024

191 points (76.6% liked)

Technology

59201 readers

3238 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

191

CrowdStrike Isn't the Real Problem (lemmy.world)

submitted 3 months ago by timewarp@lemmy.world to c/technology@lemmy.world

139 comments fedilink hide all child comments

This is an unpopular opinion, and I get why – people crave a scapegoat. CrowdStrike undeniably pushed a faulty update demanding a low-level fix (booting into recovery). However, this incident lays bare the fragility of corporate IT, particularly for companies entrusted with vast amounts of sensitive personal information.

Robust disaster recovery plans, including automated processes to remotely reboot and remediate thousands of machines, aren't revolutionary. They're basic hygiene, especially when considering the potential consequences of a breach. Yet, this incident highlights a systemic failure across many organizations. While CrowdStrike erred, the real culprit is a culture of shortcuts and misplaced priorities within corporate IT.

Too often, companies throw millions at vendor contracts, lured by flashy promises and neglecting the due diligence necessary to ensure those solutions truly fit their needs. This is exacerbated by a corporate culture where CEOs, vice presidents, and managers are often more easily swayed by vendor kickbacks, gifts, and lavish trips than by investing in innovative ideas with measurable outcomes.

This misguided approach not only results in bloated IT budgets but also leaves companies vulnerable to precisely the kind of disruptions caused by the CrowdStrike incident. When decision-makers prioritize personal gain over the long-term health and security of their IT infrastructure, it's ultimately the customers and their data that suffer.

you are viewing a single comment's thread
view the rest of the comments

[–] breakingcups@lemmy.world 171 points 3 months ago (76 children)

Please, enlighten me how you'd remotely service a few thousand Bitlocker-locked machines, that won't boot far enough to get an internet connection, with non-tech-savvy users behind them. Pray tell what common "basic hygiene" practices would've helped, especially with Crowdstrike reportedly ignoring and bypassing the rollout policies set by their customers.

Not saying the rest of your post is wrong, but this stood out as easily glossed over.

[–] LrdThndr@lemmy.world 22 points 3 months ago* (last edited 3 months ago) (33 children)

A decade ago I worked for a regional chain of gyms with locations in 4 states.

I was in TN. When a system would go down in SC or NC, we originally had three options:

(The most common) have them put it in a box and ship it to me.
I go there and fix it (rare)
I walk them through fixing it over the phone (fuck my life)

I got sick of this. So I researched options and found an open source software solution called FOG. I ran a server in our office and had little optiplex 160s running a software client that I shipped to each club. Then each machine at each club was configured to PXE boot from the fog client.

The server contained images of every machine we commonly used. I could tell FOG which locations used which models, and it would keep the images cached on the client machines.

If everything was okay, it would chain the boot to the os on the machine. But I could flag a machine for reimage and at next boot, the machine would check in with the local FOG client via PXE and get a complete reimage from premade images on the fog server.

The corporate office was physically connected to one of the clubs, so I trialed the software at our adjacent club, and when it worked great, I rolled it out company wide. It was a massive success.

So yes, I could completely reimage a computer from hundreds of miles away by clicking a few checkboxes on my computer. Since it ran in PXE, the condition of the os didn’t matter at all. It never loaded the os when it was flagged for reimage. It would even join the computer to the domain and set up that locations printers and everything. All I had to tell the low-tech gymbro sales guy on the phone to do was reboot it.

This was free software. It saved us thousands in shipping fees alone. And brought our time to fix down from days to minutes.

There ARE options out there.

[–] Brkdncr@lemmy.world 0 points 3 months ago (9 children)

How removed from IT are that you think fog would have helped here?

[–] LrdThndr@lemmy.world 5 points 3 months ago* (last edited 3 months ago) (1 children)

How would it not have? You got an office or field offices?

“Bring your computer by and plug it in over there.” And flag it for reimage. Yeah. It’s gonna be slow, since you have 200 of the damn things running at once, but you really want to go and manually touch every computer in your org?

The damn thing’s even boot looping, so you don’t even have to reboot it.

I’m sure the user saved all their data in one drive like they were supposed to, right?

I get it, it’s not a 100% fix rate. And it’s a bit of a callous answer to their data. And I don’t even know if the project is still being maintained.

But the post I replied to was lamenting the lack of an option to remotely fix unbootable machines. This was an option to remotely fix nonbootable machines. No need to be a jerk about it.

But to actually answer your question and be transparent, I’ve been doing Linux devops for 10 years now. I haven’t touched a windows server since the days of the gymbros. I DID say it’s been a decade.

[–] Brkdncr@lemmy.world 4 points 3 months ago (2 children)

Because your imaging environment would also be down. And you’re still touching each machine and bringing users into the office.

Or your imaging process over the wan takes 3 hours since it’s dynamically installing apps and updates and not a static “gold” image. Imaging is then even slower because your source disk is only ssd and imaging slows down once you get 10+ going at once.

I’m being rude because I see a lot of armchair sysadmins that don’t seem to understand the scale of the crowdstike outage, what crowdstrike even is beyond antivirus, and the workflow needed to recover from it.

[–] LrdThndr@lemmy.world 6 points 3 months ago (1 children)

FOG ran on Linux. It wouldn’t have been down. But that’s beside the point.

I never said it was a good answer to CrowdStrike. It was just a story about how I did things 10 years ago, and an option for remotely fixing nonbooting machines. That’s it.

I get you’ve been overworked and stressed as fuck this last few days. I’ve been out of corporate IT for 10 years and I do not envy the shit you guys are going through right now. I wish I could buy you a cup of coffee or a beer or something.

[–] Brkdncr@lemmy.world 3 points 3 months ago

Last time I used fog it was only doing static image deployment which has been out of style for a while. I don’t know if there are any serious deployment products for windows enterprise that don’t run on windows.

I’m personally not dealing with this because I didn’t like how Crowdstrike had answered a number of questions in their sales call.

Avoiding telling me their vuln scan doesn’t prob be all hosts after claiming it could replace a real vuln scanner, claiming they are somehow better than others at malware detection without bringing up 3rd party tests, claiming how their product was novel when others have been doing the same for 7+ years.

My fave was them telling me how much easier it is to manage but no one on the call had ever worked as a sysadmin or even seen how their competition works.

Shitshow. I’m so glad this happened so I can block their sales team.

[–] timewarp@lemmy.world -2 points 3 months ago (1 children)

Imaging environment down? If a sysadmin can't figure out how to boot a machine into recovery to remove the bad update file then they have bigger problems. The fix in this instance wasn't even re-imaging machines. It was merely removing a file. Ideal DR scenario would have a recovery image already on the system that can be booted into remotely, so there is minimal strain on the network. Furthermore, we don't live in dial-up age anymore.

[–] Brkdncr@lemmy.world 2 points 3 months ago (1 children)

Imaging environment would be bitlocker’d with its key stuck in AD which is also bitlocker’d.

[–] catloaf@lemm.ee 1 points 3 months ago (1 children)

Only if you're not practicing 3-2-1 with your backups.

[–] Brkdncr@lemmy.world 1 points 3 months ago (1 children)

Backup environment is also bitlocker’d.

[–] catloaf@lemm.ee 1 points 3 months ago

Then you didn't 3-2-1, because you should be able to restore from your alternate format, e.g. tape, without your existing infrastructure. Ideally your second and offsite copies are also offline, so even if you ignored the separate media rule, it wouldn't have been affected by the crowdstrike update.

Ultimately, nobody should have to tell you not to lock your keys in the car.

load more comments (7 replies)

load more comments (30 replies)

load more comments (72 replies)