@jouest Yeah my wife is in the Netherlands and said she couldn’t get email this morning. Also planes and trains were stopped over there too. apparently it hit Australia/NewZealand first. And the strangest thing is it wasn’t a virus apparently (Notwithstanding huge conspiracy stories will come up).
I saw the interview (Including where he had to grab some water to keep from gagging) but recovered well. He came off as an honest tech guy I would mostly trust. I worked in tech stuff for 40 years, either working with customers or doing “Software quality assurance” (QA) which we can see can be very important. So it was a fun job; lots of routine stuff punctuated by brief moments of excitement and/or panic. I don’t envy anybody at Crowdstrike this morning. I doubt anyone got much sleep once Australia started to go down.
I heard anecdotal comments that it was 1 line of code and it might even turn out to be. It could be an && where it was supposed to be an ||. Some of you will chuckle. I’ve seen that many times because I was often the guy that debugged it. Sometimes it turns out it was me that made the bad edit!
The difference is that now with massive worldwide deployment that affects so many things it’s a new world and I honestly don’t know that anybody, person, company, university, government, really has a grasp of what that means now.
EDIT I do hope the bad code edit guy gets an all-expense-paid 1 year sabbatical; will need it. he/she gave us a good warning (Unintentionally I assume) of how vulnerable our stuff is and how we don’t really understand how to manage it.
@jouest@pmarin a good wake-up call for sure.
I definitely don’t blame one guy making a mistake but the entire process.
How wasn’t this failing in any of the testing?
And yeah, there will definitely be conspiracy theories about it, from myself included lol
Had to create Windows boot media flash drives, go to 200 thin clients. Boot to USB and then open CMD in advanced boot options to remove the file. Then had to recover a lot of laptops and desktops in safe mode. Fun day.
@riceatusc Eek that is way more serious than it initially sounded.
I do think the BSOD images scattered around the world including on big screens at Times Square are going to live on as legend. As a lesson to all developers and software QA people. But really not their fault; only “safer” strategy is hugely more expensive in terms of equipment and manpower. And Crowdstrike CEO did stress the urgency of responding quickly to vulnerabilities and a multi-year testing process doesn’t work in a connected world.
It’s just the world now
But ask how did we get here?
Have to deal with it
@blaineg@riceatusc@yakkoTDI But Apple always way ahead of its time had an icon when the rest of us just got MS-DOS errors since it was before Windows. Apple’s original icon would probably be controversial today.
/image Apple Macintosh bomb error 1986
EDIT think it would be great if just for kicks, someone put that up on the big outdoor screens at Times Square that had the blue screen error yesterday!
Maybe outside the Apple Store! But they were largely unaffected except for the connected stuff which we are all connected to apparently.
@yakkoTDI That one is great. Forwarded to some friends that recently retired. One guy was apparently doing contract work still and said he had to Fed-Ex his laptop to IT in San Jose.
One is really retired and didn’t care and biggest excitement was a large bear in his backyard yesterday but it’s one of the neighborhood bears they are mostly cool.
One recently retired and makes Kimchi and it seems She had no excuse for not working on more Kimchi. Website still seems to be up (shameless plug) https://queeneykimchi.myshopify.com/
Unfortunately I didn’t have any trouble logging into work today. I did have to do some UAT for code that the developers deployed today. Fortunately it went better than Crowdstrike.
@blaineg I watched some F1 practice overnight and remembered they are a big sponsor of some teams. It was like “Oh yeah that’s where I heard of Crowdstrike — until yesterday” I’m not fully caught-up with what happened today yet and what they talked about.
They use a huge amount of connectivity between the race event which moves all over the world, and a team at the “factory” which is often UK, or Italy, Austria, and this happens at all hours of the day since they are often in far-away time zones. I don’t know if some things can even work without connectivity. (Engineering teams at the home base review data and I think control some stuff in real-time) This week is in Hungary so at least they are close to their primarily European hubs but if you can’t connect, you can’t connect.
No doubt there are many, many liability lawyers in the U.S. and worldwide, who are swarming like flies on excrement, to grab what they can of the lawsuit free for all that is in active development at the moment.
I think that Crowdstrike is going to get crowd stricken, and how.
I sure hope that Crowdstrike has insurance, but no matter how much it has, it won’t be enough.
I saw a Barron’s blurb that Crowdstrike insiders sold millions of dollars of stock as this debacle was unfolding. I couldn’t read it as it was behind a paywall.
As of this morning, the stockprice was down 10%, which is remarkable in that it isn’t a good deal lower.
I expect that this elevator will soon have the down button stuck as the full impact of the damages become better defined.
CrowdStrike Holdings, Inc. is an American cybersecurity technology company based in Austin, Texas. It provides cloud workload protection and endpoint security, threat intelligence, and cyberattack response services. Wikipedia
Founded: 2011
Founders: George Kurtz, Dmitri Alperovitch, Gregg Marston
Headquarters: Austin, TX
Number of employees: 8,429 (2024)
Revenue: $2.241 billion USD (2023)
Subsidiaries: CrowdStrike, Inc., Reposify Ltd.,
@Jackinga Interesting because the public view is from a “how can this happen? How did this affect me?” But that and lawsuits may not be the main story. But when you make technical product like this (other than airplanes, Ahem!) there is some expectation that faults can and indeed do occur. And generally limited or no legal liability. Exceptions would be known problems hidden and allowed to go on (Boeing, cough cough).
Now it can affect customer interaction and might lead to losing future business or giving big discounts to the affected people in the future (they are the consumers of the product but not you directly).
Honestly I don’t people stuck in airports or who couldn’t use an ATM will have a real case this is almost more like ambulance-chaser lawyers than a valid legal case it seems to me.
@Jackinga I doubt putting that company out of business would be prudent/would happen considering what it does. I’d suspect there is no quick replacement for what they do at the scale they do it. Maybe later down the line but I suspect the lessor of two evils is to stick with them rather than do without or change over everything in any given company.
@Kidsandliz What’s the old expressions, “You never get a second chance to make a first impression,” and “Once burned, twice shy.”
I agree that if one is dependent on the Crowdstrike’s products to conduct business, then it will take time to find an alternate supplier. A business can’t just move away overnight. But, it is going to be a major boost to the competition, no matter how you cut it.
For business that have all their AI eggs in one basket and who have suffered financial damage owing to what amounts to negligence on the part of the supplier, Crowdstrike will suffer in the long term and will face a lot of damage law suits demanding compensation for loss of business during the period of service interruption.
Many, if not most, business contracts have a force majeure clause. And if that were the case, then the damages to Crowdstrike would be minimal. It seems to me that this isn’t a case of force majeure, but of negligence, whether willful or not.
Given that Crowdstrike is ~$2 billion company, they are ripe for attack in the courts by liability law firms looking to extract as much as possible from that pot of potential money.
It will take time to sort all this out, of course. But I have no doubt that with this single event, what was a once trusted provider will be knocked silly by hordes of hungry lawyers looking to consoldiate classes of customers into massive suits for damages. Of course, the ones who will profit the most will be the law firms.
I have worked in companies in which force majeure events brought work stoppages and damages and that was painful enough. I haven’t experienced negligence by a supplier, however, during my working life, but I am confident that Crowdstrike is going to suffer in both damages and in loss of business as alternative suppliers are found.
Call it greed, if you will, but it is the American way.
@Jackinga@Kidsandliz I agree about “ the ones who will profit the most will be the law firms.”
And I see what you’re trying to argue but I don’t see this the way you describe it. Engineering faults do occur and there is some assumption of a risk of that, with any product change (be it a new version of hardware or just a software update, which by-the-way often happen very frequently). There will never be a 100% tested and guaranteed (under penalty or lawsuit or imprisonment) technology product, again be it hardware or software. Lately the hardware is often more stable and standardized, but the software changes are often larger in terms of what changed, and more frequent. In some cases daily or weekly. I don’t yet understand the details of Crowdstrike’s technology but it seems like an environment where very frequent updates may be needed to deliver what they intended and the customers expect. This means the year-long testing that I somewhat jokingly proposed earlier is of course impractical. You can’t deliver a security product that today that says it protects from all the threats identified in 2022.
But there will be bugs, defects, omissions, unexpected failures, whatever you want to call them. Nobody will pay $millions in lawsuits any differently than you don’t get millions if the Chinese toaster you bought at Target doesn’t work. You’ll get a new toaster and/or your money back.
If anything the main thing to examine here is the roll-out mechanism.It’s not even clear to me how this is controlled by Crowdstrike or what is within Microsoft. I’m tending to think the fault (and by this I mean “bad strategy” not “I’ll sue go directly to Jail do not pass Go” lies within Microsoft as I think their systems installed the updates and were affected immediately. It seems unwise to me that there was not some “test install” on limited systems even if only a few hours before full deployment. When I did software test it was often “new release just built need to ship later tonight; throw whatever you can think of at it let us know if you find anything” Not 100% comprehensive by any means but helps prevent a situation like this which we would describe as a DOA and halt distribution immediately. And this was to a very limited technology customer base, not the whole world, not the Formula 1 teams sponsored by Crowdstrike, not the big screens on Times Square, not all the airport screens, not you emails and ATM access.
This was essentially the Y2K scenario some feared, but never happened, but in fairness thousands of people worked and many millions of dollars were spent in preparing for that including testing lots of scenarios on different equipment.
If you had to pick Epic Fail 2024 this would be it. At least I hope that remains true I’d hate to think what else may be in store in remaining 1/2 year.
Thus the results of allowing a software monoculture and participating in it, usually without any consideration for the problem, or for alternative options or platforms. Sure, this time it wasn’t directly microsoft’s fault (though if the platform was more secure there would be less need for third party bandages). And with most places doing NO due diligence, pursuing NO research on alternatives, and just glazed eye vacant brain accepting the monoculture as the only choice…
@duodec I like that term software monoculture. I don’t like the concept and that it’s dominant enough to even talk about, but yeah it’s an accurate name. Not that different from how a lot of our domestic food production works with similar risks (except that a lot of us die or starve and can’t be rebooted.).
I’m no expert, but I was peripherally involved in our Crowdstrike deployment a few years ago. It’s a glorified, cloud based, security/antivirus service. It looks more at behaviour than virus/threat databases.
It runs at a very low level, and the file that broke everything was a device driver, and that what makes fixing it difficult. If you’re very lucky with a race condition, the updater grabs the fixed file before it gets started. If not, you’ve got to access the disk offline: in another computer; or with a bootable disk, and delete the problem file. This is what makes it a hands-on repair. Most remote tools rely on Widows being running, and of course it isn’t.
The “it was just one file” excuse is like saying “it was just one bullet”, or maybe “just one nuke”, judging by the results.
Plain & simple, Crowdstrike failed to do adequate testing before pushing this update.
I used to say that Microsoft has no users, only victims. I guess I need to update that.
@blaineg “I used to say that Microsoft has no users, only victims. I guess I need to update that.”
Now an established industry standard! Thanks Microsoft!
Most of you have never heard of it before, but work uses it, and today is a really good day not to be in IT.
I’m one of the lucky ones, I just had to reboot from a BSOD.
…on a Friday!
@jouest Yeah my wife is in the Netherlands and said she couldn’t get email this morning. Also planes and trains were stopped over there too. apparently it hit Australia/NewZealand first. And the strangest thing is it wasn’t a virus apparently (Notwithstanding huge conspiracy stories will come up).
I saw the interview (Including where he had to grab some water to keep from gagging) but recovered well. He came off as an honest tech guy I would mostly trust. I worked in tech stuff for 40 years, either working with customers or doing “Software quality assurance” (QA) which we can see can be very important. So it was a fun job; lots of routine stuff punctuated by brief moments of excitement and/or panic. I don’t envy anybody at Crowdstrike this morning. I doubt anyone got much sleep once Australia started to go down.
I heard anecdotal comments that it was 1 line of code and it might even turn out to be. It could be an && where it was supposed to be an ||. Some of you will chuckle. I’ve seen that many times because I was often the guy that debugged it. Sometimes it turns out it was me that made the bad edit!
The difference is that now with massive worldwide deployment that affects so many things it’s a new world and I honestly don’t know that anybody, person, company, university, government, really has a grasp of what that means now.
EDIT I do hope the bad code edit guy gets an all-expense-paid 1 year sabbatical; will need it. he/she gave us a good warning (Unintentionally I assume) of how vulnerable our stuff is and how we don’t really understand how to manage it.
@jouest @pmarin well said
@pmarin regarding the bad code edit guy, it reminds me of a quote from somebody whose employee had just made some sort of a $600,000 error.
“Fire him? Why would I fire him? I just spent $600,000 training him!”
@jouest @pmarin a good wake-up call for sure.
I definitely don’t blame one guy making a mistake but the entire process.
How wasn’t this failing in any of the testing?
And yeah, there will definitely be conspiracy theories about it, from myself included lol
@jouest @pmarin I haven’t heard anything about anything until now. But mention of subtle errors (like
&&
for||
) reminds me of this:@jouest @xobzoo This is why we can’t have nice robots.
@jouest @xobzoo So $600K is a typical salary of a Mercatalyst Employee? Damn, looks like I picked the wrong time to retire from tech.
/image Airplane picked the wrong week to quit sniffing glue scene from movie
Todays haiku
It’s a normal day,
Why does my internet fail?
It’s just some bad code.
Arrived for night flight
But had to sleep on the floor
When will the planes fly?
@pmarin
It was well deserved
The bad code guy vacation
Hopefully still sane
Had to create Windows boot media flash drives, go to 200 thin clients. Boot to USB and then open CMD in advanced boot options to remove the file. Then had to recover a lot of laptops and desktops in safe mode. Fun day.
@riceatusc Eek that is way more serious than it initially sounded.
I do think the BSOD images scattered around the world including on big screens at Times Square are going to live on as legend. As a lesson to all developers and software QA people. But really not their fault; only “safer” strategy is hugely more expensive in terms of equipment and manpower. And Crowdstrike CEO did stress the urgency of responding quickly to vulnerabilities and a multi-year testing process doesn’t work in a connected world.
It’s just the world now
But ask how did we get here?
Have to deal with it
@pmarin @riceatusc Multi year testing? That’s a lovely straw man, Mr. CEO. How about ANY testing at all?
@blaineg @pmarin @riceatusc At least the newer BSOD has a frowny face on it so you don’t feel as much anger.
@blaineg @riceatusc @yakkoTDI But Apple always way ahead of its time had an icon when the rest of us just got MS-DOS errors since it was before Windows. Apple’s original icon would probably be controversial today.
/image Apple Macintosh bomb error 1986
EDIT think it would be great if just for kicks, someone put that up on the big outdoor screens at Times Square that had the blue screen error yesterday!
Maybe outside the Apple Store! But they were largely unaffected except for the connected stuff which we are all connected to apparently.
@pmarin @riceatusc @yakkoTDI Amiga had the advantage of color.
I just wanted to use the fuck damned atm
@yakkoTDI Randall is on the ball!
@yakkoTDI That one is great. Forwarded to some friends that recently retired. One guy was apparently doing contract work still and said he had to Fed-Ex his laptop to IT in San Jose.
One is really retired and didn’t care and biggest excitement was a large bear in his backyard yesterday but it’s one of the neighborhood bears they are mostly cool.
One recently retired and makes Kimchi and it seems She had no excuse for not working on more Kimchi. Website still seems to be up (shameless plug)
https://queeneykimchi.myshopify.com/
@pmarin @yakkoTDI Neighborhood bear retired and makes kimchi?! That’s pretty spectacular!
@mschuette @yakkoTDI
/image smarter than your average bear
@werehatrack Love it!
Unfortunately I didn’t have any trouble logging into work today. I did have to do some UAT for code that the developers deployed today. Fortunately it went better than Crowdstrike.
Via the BBC:
@blaineg I watched some F1 practice overnight and remembered they are a big sponsor of some teams. It was like “Oh yeah that’s where I heard of Crowdstrike — until yesterday” I’m not fully caught-up with what happened today yet and what they talked about.
They use a huge amount of connectivity between the race event which moves all over the world, and a team at the “factory” which is often UK, or Italy, Austria, and this happens at all hours of the day since they are often in far-away time zones. I don’t know if some things can even work without connectivity. (Engineering teams at the home base review data and I think control some stuff in real-time) This week is in Hungary so at least they are close to their primarily European hubs but if you can’t connect, you can’t connect.
This is the world’s most epic OSHIT report…
@shahnm anybody that had a worthwhile engineering career has experienced that in some form. But NOT like this!
No doubt there are many, many liability lawyers in the U.S. and worldwide, who are swarming like flies on excrement, to grab what they can of the lawsuit free for all that is in active development at the moment.
I think that Crowdstrike is going to get crowd stricken, and how.
I sure hope that Crowdstrike has insurance, but no matter how much it has, it won’t be enough.
I saw a Barron’s blurb that Crowdstrike insiders sold millions of dollars of stock as this debacle was unfolding. I couldn’t read it as it was behind a paywall.
As of this morning, the stockprice was down 10%, which is remarkable in that it isn’t a good deal lower.
I expect that this elevator will soon have the down button stuck as the full impact of the damages become better defined.
CrowdStrike Holdings, Inc. is an American cybersecurity technology company based in Austin, Texas. It provides cloud workload protection and endpoint security, threat intelligence, and cyberattack response services. Wikipedia
Founded: 2011
Founders: George Kurtz, Dmitri Alperovitch, Gregg Marston
Headquarters: Austin, TX
Number of employees: 8,429 (2024)
Revenue: $2.241 billion USD (2023)
Subsidiaries: CrowdStrike, Inc., Reposify Ltd.,
@Jackinga Interesting because the public view is from a “how can this happen? How did this affect me?” But that and lawsuits may not be the main story. But when you make technical product like this (other than airplanes, Ahem!) there is some expectation that faults can and indeed do occur. And generally limited or no legal liability. Exceptions would be known problems hidden and allowed to go on (Boeing, cough cough).
Now it can affect customer interaction and might lead to losing future business or giving big discounts to the affected people in the future (they are the consumers of the product but not you directly).
Honestly I don’t people stuck in airports or who couldn’t use an ATM will have a real case this is almost more like ambulance-chaser lawyers than a valid legal case it seems to me.
@Jackinga I doubt putting that company out of business would be prudent/would happen considering what it does. I’d suspect there is no quick replacement for what they do at the scale they do it. Maybe later down the line but I suspect the lessor of two evils is to stick with them rather than do without or change over everything in any given company.
@Kidsandliz What’s the old expressions, “You never get a second chance to make a first impression,” and “Once burned, twice shy.”
I agree that if one is dependent on the Crowdstrike’s products to conduct business, then it will take time to find an alternate supplier. A business can’t just move away overnight. But, it is going to be a major boost to the competition, no matter how you cut it.
For business that have all their AI eggs in one basket and who have suffered financial damage owing to what amounts to negligence on the part of the supplier, Crowdstrike will suffer in the long term and will face a lot of damage law suits demanding compensation for loss of business during the period of service interruption.
Many, if not most, business contracts have a force majeure clause. And if that were the case, then the damages to Crowdstrike would be minimal. It seems to me that this isn’t a case of force majeure, but of negligence, whether willful or not.
Given that Crowdstrike is ~$2 billion company, they are ripe for attack in the courts by liability law firms looking to extract as much as possible from that pot of potential money.
It will take time to sort all this out, of course. But I have no doubt that with this single event, what was a once trusted provider will be knocked silly by hordes of hungry lawyers looking to consoldiate classes of customers into massive suits for damages. Of course, the ones who will profit the most will be the law firms.
I have worked in companies in which force majeure events brought work stoppages and damages and that was painful enough. I haven’t experienced negligence by a supplier, however, during my working life, but I am confident that Crowdstrike is going to suffer in both damages and in loss of business as alternative suppliers are found.
Call it greed, if you will, but it is the American way.
@Jackinga @Kidsandliz I agree about “ the ones who will profit the most will be the law firms.”
And I see what you’re trying to argue but I don’t see this the way you describe it. Engineering faults do occur and there is some assumption of a risk of that, with any product change (be it a new version of hardware or just a software update, which by-the-way often happen very frequently). There will never be a 100% tested and guaranteed (under penalty or lawsuit or imprisonment) technology product, again be it hardware or software. Lately the hardware is often more stable and standardized, but the software changes are often larger in terms of what changed, and more frequent. In some cases daily or weekly. I don’t yet understand the details of Crowdstrike’s technology but it seems like an environment where very frequent updates may be needed to deliver what they intended and the customers expect. This means the year-long testing that I somewhat jokingly proposed earlier is of course impractical. You can’t deliver a security product that today that says it protects from all the threats identified in 2022.
But there will be bugs, defects, omissions, unexpected failures, whatever you want to call them. Nobody will pay $millions in lawsuits any differently than you don’t get millions if the Chinese toaster you bought at Target doesn’t work. You’ll get a new toaster and/or your money back.
If anything the main thing to examine here is the roll-out mechanism.It’s not even clear to me how this is controlled by Crowdstrike or what is within Microsoft. I’m tending to think the fault (and by this I mean “bad strategy” not “I’ll sue go directly to Jail do not pass Go” lies within Microsoft as I think their systems installed the updates and were affected immediately. It seems unwise to me that there was not some “test install” on limited systems even if only a few hours before full deployment. When I did software test it was often “new release just built need to ship later tonight; throw whatever you can think of at it let us know if you find anything” Not 100% comprehensive by any means but helps prevent a situation like this which we would describe as a DOA and halt distribution immediately. And this was to a very limited technology customer base, not the whole world, not the Formula 1 teams sponsored by Crowdstrike, not the big screens on Times Square, not all the airport screens, not you emails and ATM access.
This was essentially the Y2K scenario some feared, but never happened, but in fairness thousands of people worked and many millions of dollars were spent in preparing for that including testing lots of scenarios on different equipment.
If you had to pick Epic Fail 2024 this would be it. At least I hope that remains true I’d hate to think what else may be in store in remaining 1/2 year.
Thus the results of allowing a software monoculture and participating in it, usually without any consideration for the problem, or for alternative options or platforms. Sure, this time it wasn’t directly microsoft’s fault (though if the platform was more secure there would be less need for third party bandages). And with most places doing NO due diligence, pursuing NO research on alternatives, and just glazed eye vacant brain accepting the monoculture as the only choice…
Welcome to the world you voted for.
@duodec I like that term software monoculture. I don’t like the concept and that it’s dominant enough to even talk about, but yeah it’s an accurate name. Not that different from how a lot of our domestic food production works with similar risks (except that a lot of us die or starve and can’t be rebooted.).
@duodec @pmarin Have you heard of the banana apocalypse (and its potential sequel)?
https://wikipedia.org/wiki/Panama_disease
Humans love unity and harmony. To the point we start inbreeding purebreds.
On Amazon today. I wonder which service?
@blaineg
I’m no expert, but I was peripherally involved in our Crowdstrike deployment a few years ago. It’s a glorified, cloud based, security/antivirus service. It looks more at behaviour than virus/threat databases.
It runs at a very low level, and the file that broke everything was a device driver, and that what makes fixing it difficult. If you’re very lucky with a race condition, the updater grabs the fixed file before it gets started. If not, you’ve got to access the disk offline: in another computer; or with a bootable disk, and delete the problem file. This is what makes it a hands-on repair. Most remote tools rely on Widows being running, and of course it isn’t.
The “it was just one file” excuse is like saying “it was just one bullet”, or maybe “just one nuke”, judging by the results.
Plain & simple, Crowdstrike failed to do adequate testing before pushing this update.
I used to say that Microsoft has no users, only victims. I guess I need to update that.
@blaineg “I used to say that Microsoft has no users, only victims. I guess I need to update that.”
Now an established industry standard! Thanks Microsoft!
The view from a McDonalds I stopped at Friday… The ‘now serving’ order display box… In a reboot loop