this post was submitted on 13 Nov 2023
32 points (100.0% liked)

Australia

3515 readers
88 users here now

A place to discuss Australia and important Australian issues.

Before you post:

If you're posting anything related to:

If you're posting Australian News (not opinion or discussion pieces) post it to Australian News

Rules

This community is run under the rules of aussie.zone. In addition to those rules:

Banner Photo

Congratulations to @Tau@aussie.zone who had the most upvoted submission to our banner photo competition

Recommended and Related Communities

Be sure to check out and subscribe to our related communities on aussie.zone:

Plus other communities for sport and major cities.

https://aussie.zone/communities

Moderation

Since Kbin doesn't show Lemmy Moderators, I'll list them here. Also note that Kbin does not distinguish moderator comments.

Additionally, we have our instance admins: @lodion@aussie.zone and @Nath@aussie.zone

founded 1 year ago
MODERATORS
 

So it turns out that the cause was indeed a rogue change they couldn't roll back as we had been speculating.

Weird that whatever this issue is didn't occur in their test environment before they deployed into Production. I wonder why that is.

top 12 comments
sorted by: hot top controversial new old
[–] DavidDoesLemmy@aussie.zone 30 points 10 months ago

All companies have a test environment. Some companies are lucky enough to have a separate environment for production.

[–] No1@aussie.zone 10 points 10 months ago* (last edited 10 months ago) (1 children)

Change Manager who approved this is gonna be sweating bullets lol

"Let's take a look at the change request. Now, see here, this section for Contingencies and Rollback process? Why is it blank?"

[–] zurohki@aussie.zone 3 points 10 months ago (1 children)

I'm sure there was a rollback process, it just didn't work.

[–] MalReynolds@slrpnk.net 0 points 10 months ago

If you don't test a restored backup, you don't have a backup, same applies here.

[–] unionagainstdhmo@aussie.zone 9 points 10 months ago (1 children)

It's as if they were running Windows 10 Home on their servers and had an upgrade forced onto them

[–] pntha@lemmy.world 4 points 10 months ago

how else do you explain to the layman “catastrophic failure in the configuration update of core network infrastructure and its preceding, meant-to-be-soundproof, processes”

[–] Gamers_Mate@aussie.zone 3 points 10 months ago

This should be their new logo. https://imgur.com/a/bdIUayH

[–] ji88aja88a@lemmy.world 2 points 10 months ago (1 children)

This happens in my business all the time...the test FTP IP address is left in the code and shit falls apart costing us millions... They hold a PIR and then it happens again.

[–] No1@aussie.zone 2 points 10 months ago
[–] autotldr@lemmings.world 1 points 10 months ago

This is the best summary I could come up with:


Optus says "changes to routing information" after a "routine software upgrade" was behind last week's nationwide outage, affecting 10.2 million Australians and impacting 400,000 businesses.

"These routing information changes propagated through multiple layers in our network and exceeded preset safety levels on key routers which could not handle these," the company said.

Before Monday's disclosure by Optus, experts had theorised the outage was likely a "regular software upgrade gone wrong".

"The problem is too widespread to be due to a cable break or equipment failure," said Tom Worthington, a senior lecturer in computer science from the Australian National University in Canberra.

The software upgrade theory surmised by telecommunications analysts and experts last Wednesday were put to Optus CEO Kelly Bayer Rosmarin, who rejected those suggestions.

The reason for the outage follows the federal government announcing earlier on Monday that it would require telecommunications companies in Australia to report their cybersecurity measures to avoid a repeat of Optus' cyber hack last year.


The original article contains 528 words, the summary contains 159 words. Saved 70%. I'm a bot and I'm open source!

[–] SituationCake@aussie.zone 1 points 10 months ago

If this is how they do their routine updates, they have had an extremely lucky run so far. Inadequate understanding of what the update would/could do, inadequate testing prior to deployment, no rollback capability, no disaster recovery plan. Yeah nah, you can’t get that lucky for that long. Maybe they have cut budget or sacked the people who knew what they were doing? Let’s hope they learn from this.

[–] Lophostemon@aussie.zone 1 points 10 months ago

Optus: Yes We Fucked Up